Querying highly similar sequences.

Int J Comput Biol Drug Des

Department of Informatics, King's College London, London, UK.

Published: July 2013

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), …, S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

Download full-text PDF

Source
http://dx.doi.org/10.1504/IJCBDD.2013.052206DOI Listing

Publication Analysis

Top Keywords

extreme similarity
8
similarity sequencing
8
sequencing problem
8
number errors
8
time algorithm
8
querying highly
4
highly sequences
4
sequences paper
4
paper solution
4
solution extreme
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!