Mapping-friendly sequence reductions: Going beyond homopolymer compression.

iScience

Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.

Published: November 2022

Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses runs of repeated nucleotides, to remove some sequencing errors and improve mapping sensitivity. Though our intuitive understanding justifies why homopolymer compression works, it in no way implies that it is the best transformation that can be done. In this paper, we explore if there are transformations that can be applied in the same pre-processing manner as homopolymer compression that would achieve better alignment sensitivity. We introduce a more general framework than homopolymer compression, called mapping-friendly sequence reductions. We transform the reference and the reads using these reductions and then apply an alignment algorithm. We demonstrate that some mapping-friendly sequence reductions lead to improved mapping accuracy, outperforming homopolymer compression.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633736PMC
http://dx.doi.org/10.1016/j.isci.2022.105305DOI Listing

Publication Analysis

Top Keywords

homopolymer compression
24
mapping-friendly sequence
12
sequence reductions
12
sequencing errors
8
homopolymer
7
compression
6
reductions
4
reductions going
4
going homopolymer
4
compression sequencing
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!