On subset seeds for protein alignment.

Mikhail Roytberg Anna Gambin Laurent Noé Slawomir Lasota Eugenia Furletova Ewa Szczurek Gregory Kucherov

IEEE/ACM Trans Comput Biol Bioinform

Institute of Mathematical Problems in Biology, Pushchino, Moscow Region 142290, Russia.

Published: November 2009

We apply the concept of subset seeds to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method, as well as with the family of vector seeds. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2009.4	DOI Listing

Publication Analysis

Top Keywords

subset seeds

seeds

vector seeds

better performance

seeds protein

protein alignment

alignment apply

apply concept

concept subset

seeds similarity

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!