Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-025-57756-zDOI Listing

Publication Analysis

Top Keywords

svs
9
enables accurate
8
genotyping svs
8
svs svlearn
8
svlearn
6
genotyping
5
svlearn dual-reference
4
dual-reference machine
4
machine learning
4
learning approach
4

Similar Publications

Interstitial lung disease (ILD) consists of a group of immune-mediated disorders that can cause inflammation and progressive fibrosis of the lungs, representing an area of unmet medical need given the lack of disease-modifying therapies and toxicities associated with current treatment options. Tissue-specific splice variants (SVs) of human aminoacyl-tRNA synthetases (aaRSs) are catalytic nulls thought to confer regulatory functions. One example from human histidyl-tRNA synthetase (HARS), termed HARS because the splicing event resulted in a protein encompassing the WHEP-TRS domain of HARS (a structurally conserved domain found in multiple aaRSs), is enriched in human lung and up-regulated by inflammatory cytokines in lung and immune cells.

View Article and Find Full Text PDF

Spatial 3D genome organization reveals intratumor heterogeneity in primary glioblastoma samples.

Sci Adv

March 2025

Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.

Glioblastoma (GBM) is the most prevalent malignant brain tumor with poor prognosis. Although chromatin intratumoral heterogeneity is a characteristic feature of GBM, most current studies are conducted at a single tumor site. To investigate the GBM-specific 3D genome organization and its heterogeneity, we conducted Hi-C experiments in 21 GBM samples from nine patients, along with three normal brain samples.

View Article and Find Full Text PDF

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants.

Nat Commun

March 2025

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China.

Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs.

View Article and Find Full Text PDF

Comparison of Measurement Methods for Stone Volume Estimation: An Study.

J Endourol

March 2025

Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France.

Urolithiasis guidelines still rely on the maximum stone diameter to propose treatment strategy, although this measure is known to have many pitfalls. Stone volume (SV) could represent a more accurate measurement, helping to plan the treatment or follow-up. Various methods to measure SV have been proposed.

View Article and Find Full Text PDF

Rare pathogenic structural variants show potential to enhance prostate cancer germline testing for African men.

Nat Commun

March 2025

Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia.

Prostate cancer (PCa) is highly heritable, with men of African ancestry at greatest risk and associated lethality. Lack of representation in genomic data means germline testing guidelines exclude for Africans. Established that structural variations (SVs) are major contributors to human disease and prostate tumourigenesis, their role is under-appreciated in familial and therapeutic testing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!