Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called SNPs and calls indels with local realignment. Evaluation on 8 human genomes demonstrates that NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants in a widely used benchmarking genome, which could not be reliably detected previously. In summary, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8419925PMC
http://dx.doi.org/10.1186/s13059-021-02472-2DOI Listing

Publication Analysis

Top Keywords

long-read sequencing
12
regions long-read
8
genomic regions
8
novel variants
8
nanocaller
4
nanocaller accurate
4
accurate detection
4
detection snps
4
snps indels
4
indels difficult-to-map
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!