Correcting BLAST e-values for low-complexity segments.

J Comput Biol

Department of Computer Science, Technion, Haifa, Israel.

Published: September 2005

AI Article Synopsis

Article Abstract

The statistical estimates of BLAST and PSI-BLAST are of extreme importance to determine the biological relevance of sequence matches. While being very effective in evaluating most matches, these estimates usually overestimate the significance of matches in the presence of low complexity segments. In this paper, we present a model, based on divergence measures and statistics of the alignment structure, that corrects BLAST e-values for low complexity sequences without filtering or excluding them and generates scores that are more effective in distinguishing true similarities from chance similarities. We evaluate our method and compare it to other known methods using the Gene Ontology (GO) knowledge resource as a benchmark. Various performance measures, including ROC analysis, indicate that the new model improves upon the state of the art. The program is available at biozon.org/ftp/ and www.cs.technion.ac.il/ approximately itaish/lowcomp/.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2005.12.980DOI Listing

Publication Analysis

Top Keywords

blast e-values
8
low complexity
8
correcting blast
4
e-values low-complexity
4
low-complexity segments
4
segments statistical
4
statistical estimates
4
estimates blast
4
blast psi-blast
4
psi-blast extreme
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!