Using sequence compression to speedup probabilistic profile matching.

Bioinformatics

Information Science and Technology Institute, University of Urbino, 61029 Urbino, Italy.

Published: May 2005

Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N.

Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bti323DOI Listing

Publication Analysis

Top Keywords

probabilistic profile
8
sequence compression
4
compression speedup
4
speedup probabilistic
4
profile
4
profile matching
4
matching motivation
4
motivation matching
4
matching biological
4
biological sequence
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!