Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%-90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107899PMC
http://dx.doi.org/10.1093/hr/uhac288DOI Listing

Publication Analysis

Top Keywords

false-positive alignments
12
alignments repetitive
8
repetitive genomic
8
genomic regions
8
existing aligners
8
rare mers
8
alignments correct
8
alignments
5
rafilter
4
rafilter algorithm
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!