Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N.
Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/bti323 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!