A fast algorithm for exact sequence search in biological sequences using polyphase decomposition.

Abhilash Srikantha Ajit S Bopardikar Kalyan Kumar Kaipa Parthasarathy Venkataraman Kyusang Lee TaeJin Ahn Rangavittal Narayanan

Bioinformatics

Samsung Advanced Institute of Technology, Bangalore, Karnataka, India.

Published: September 2010

Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained in the newly sequenced data. Many existing algorithms do not scale up well for large sequences or databases because of their high-computational costs. This article describes an efficient algorithm for performing fast searches on large DNA sequences. It makes use of hash tables of Q-grams that are constructed after downsampling the database, to enable efficient search and memory use. Time complexity for pattern search is reduced using beam pruning techniques. Theoretical complexity calculations and performance figures are presented to indicate the potential of the proposed algorithm.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935425	PMC
http://dx.doi.org/10.1093/bioinformatics/btq364	DOI Listing

Publication Analysis

Top Keywords

sequence search

exact sequence

sequence

fast algorithm

algorithm exact

search biological

biological sequences

sequences polyphase

polyphase decomposition

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!