Motivation: Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation.

Results: NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs.

Availability And Implementation: Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10955255PMC
http://dx.doi.org/10.1093/bioinformatics/btae129DOI Listing

Publication Analysis

Top Keywords

npsv-deep deep
8
structural variants
8
genome sequencing
8
sequencing data
8
genotyping accuracy
8
srs data
8
genotyping
6
svs
6
npsv-deep
5
deep learning
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!