On the predictability of protein database search complexity and its relevance to optimization of distributed searches.

J Proteome Res

Diversa Corporation, San Diego, California 92121, USA.

Published: September 2007

We discuss several aspects related to load balancing of database search jobs in a distributed computing environment, such as Linux cluster. Load balancing is a technique for making the most of multiple computational resources, which is particularly relevant in environments in which the usage of such resources is very high. The particular case of the Sequest program is considered here, but the general methodology should apply to any similar database search program. We show how the runtimes for Sequest searches of tandem mass spectral data can be predicted from profiles of previous representative searches, and how this information can be used for better load balancing of novel data. A well-known heuristic load balancing method is shown to be applicable to this problem, and its performance is analyzed for a variety of search parameters.

Download full-text PDF

Source
http://dx.doi.org/10.1021/pr070066uDOI Listing

Publication Analysis

Top Keywords

load balancing
16
database search
12
predictability protein
4
protein database
4
search
4
search complexity
4
complexity relevance
4
relevance optimization
4
optimization distributed
4
distributed searches
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!