DNA sequences alignment method using sparse index on pan-genome graph.

J Bioinform Comput Biol

School of Computer Science, University of Science and Technology of China, Heifei, Anhui 230027, P. R. China.

Published: August 2024

The graph of sequences represents the genetic variations of pan-genome concisely and space-efficiently than multiple linear reference genome. In order to accelerate aligning reads to the graph, an index of graph-based reference genomes is used to obtain candidate locations. However, the potential combinatorial explosion of nodes on the sequence graph leads to increasing the index space and maximum memory usage of alignment process considerably, especially for large-scale datasets. For this, existing methods typically attempt to prune complex regions, or extend the length of seeds, which sacrifices the recall of alignment algorithm despite reducing space usage slightly. We present the and alignment algorithm , capable of indexing and aligning at the lower memory cost. SIG builds the non-overlapping minimizers index inside nodes of sequence graph and SIG-Aligner filters out most of the false positive matches by the method based on the pigeonhole principle. Compared to Giraffe, the results of computational experiments show that SIG achieves a significant reduction in index memory space ranging from 50% to 75% for the human pan-genome graphs, while still preserving superior or comparable accuracy of alignment and the faster alignment time.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720024500197DOI Listing

Publication Analysis

Top Keywords

nodes sequence
8
sequence graph
8
usage alignment
8
alignment algorithm
8
alignment
6
graph
5
dna sequences
4
sequences alignment
4
alignment method
4
method sparse
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!