PSI: indexing protein structures for fast similarity search.

Orhan Camoglu Tamer Kahveci Ambuj K Singh

Bioinformatics

Department of Computer Science University of California, Santa Barbara, CA 93106, USA.

Published: October 2004

Motivation: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques.

Results: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btg1009	DOI Listing

Publication Analysis

Top Keywords

protein structure

structure databases

query protein

proteins database

feature vectors

protein

structure

psi indexing

indexing protein

protein structures

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!