A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7792008PMC
http://dx.doi.org/10.1186/s13059-020-02244-4DOI Listing

Publication Analysis

Top Keywords

long reads
12
error rate
12
ratatosk corrected
8
corrected reads
8
reads
6
ratatosk
5
ratatosk hybrid
4
error
4
hybrid error
4
error correction
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!