Querying highly similar sequences.

Carl Barton Mathieu Giraud Costas S Iliopoulos Thierry Lecroq Laurent Mouchard Solon P Pissis

Int J Comput Biol Drug Des

Department of Informatics, King's College London, London, UK.

Published: July 2013

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), , S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

Download full-text PDF	Source
http://dx.doi.org/10.1504/IJCBDD.2013.052206	DOI Listing

Publication Analysis

Top Keywords

extreme similarity

similarity sequencing

sequencing problem

number errors

time algorithm

querying highly

highly sequences

sequences paper

paper solution

solution extreme

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!