Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1038/s41467-025-57756-z | DOI Listing |
Sci Transl Med
March 2025
Department of Molecular Medicine, Scripps Research Institute, La Jolla, CA 92037, USA.
Interstitial lung disease (ILD) consists of a group of immune-mediated disorders that can cause inflammation and progressive fibrosis of the lungs, representing an area of unmet medical need given the lack of disease-modifying therapies and toxicities associated with current treatment options. Tissue-specific splice variants (SVs) of human aminoacyl-tRNA synthetases (aaRSs) are catalytic nulls thought to confer regulatory functions. One example from human histidyl-tRNA synthetase (HARS), termed HARS because the splicing event resulted in a protein encompassing the WHEP-TRS domain of HARS (a structurally conserved domain found in multiple aaRSs), is enriched in human lung and up-regulated by inflammatory cytokines in lung and immune cells.
View Article and Find Full Text PDFSci Adv
March 2025
Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Glioblastoma (GBM) is the most prevalent malignant brain tumor with poor prognosis. Although chromatin intratumoral heterogeneity is a characteristic feature of GBM, most current studies are conducted at a single tumor site. To investigate the GBM-specific 3D genome organization and its heterogeneity, we conducted Hi-C experiments in 21 GBM samples from nine patients, along with three normal brain samples.
View Article and Find Full Text PDFNat Commun
March 2025
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China.
Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs.
View Article and Find Full Text PDFJ Endourol
March 2025
Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France.
Urolithiasis guidelines still rely on the maximum stone diameter to propose treatment strategy, although this measure is known to have many pitfalls. Stone volume (SV) could represent a more accurate measurement, helping to plan the treatment or follow-up. Various methods to measure SV have been proposed.
View Article and Find Full Text PDFNat Commun
March 2025
Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia.
Prostate cancer (PCa) is highly heritable, with men of African ancestry at greatest risk and associated lethality. Lack of representation in genomic data means germline testing guidelines exclude for Africans. Established that structural variations (SVs) are major contributors to human disease and prostate tumourigenesis, their role is under-appreciated in familial and therapeutic testing.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!