Protein ranking by semi-supervised network propagation.

BMC Bioinformatics

NEC LABS AMERICA, 4 Independence Way, Princeton, NJ, USA.

Published: March 2006

Background: Biologists regularly search DNA or protein databases for sequences that share an evolutionary or functional relationship with a given query sequence. Traditional search methods, such as BLAST and PSI-BLAST, focus on detecting statistically significant pairwise sequence alignments and often miss more subtle sequence similarity. Recent work in the machine learning community has shown that exploiting the global structure of the network defined by these pairwise similarities can help detect more remote relationships than a purely local measure.

Methods: We review RankProp, a ranking algorithm that exploits the global network structure of similarity relationships among proteins in a database by performing a diffusion operation on a protein similarity network with weighted edges. The original RankProp algorithm is unsupervised. Here, we describe a semi-supervised version of the algorithm that uses labeled examples. Three possible ways of incorporating label information are considered: (i) as a validation set for model selection, (ii) to learn a new network, by choosing which transfer function to use for a given query, and (iii) to estimate edge weights, which measure the probability of inferring structural similarity.

Results: Benchmarked on a human-curated database of protein structures, the original RankProp algorithm provides significant improvement over local network search algorithms such as PSI-BLAST. Furthermore, we show here that labeled data can be used to learn a network without any need for estimating parameters of the transfer function, and that diffusion on this learned network produces better results than the original RankProp algorithm with a fixed network.

Conclusion: In order to gain maximal information from a network, labeled and unlabeled data should be used to extract both local and global structure.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810311PMC
http://dx.doi.org/10.1186/1471-2105-7-S1-S10DOI Listing

Publication Analysis

Top Keywords

original rankprop
12
rankprop algorithm
12
network
9
global structure
8
learn network
8
transfer function
8
algorithm
5
protein
4
protein ranking
4
ranking semi-supervised
4

Similar Publications

RANKPROP: a web server for protein remote homology detection.

Bioinformatics

January 2009

NEC Laboratories of America, Princeton, NJ, USA.

Unlabelled: We present a large-scale implementation of the Rankprop protein homology ranking algorithm in the form of an openly accessible web server. We use the NRDB40 PSI-BLAST all-versus-all protein similarity network of 1.1 million proteins to construct the graph for the Rankprop algorithm, whereas previously, results were only reported for a database of 108 000 proteins.

View Article and Find Full Text PDF

Motivation: Identifying protein orthologs is an important task that is receiving growing attention in the bioinformatics literature. Orthology detection provides a fundamental tool towards understanding protein evolution, predicting protein functions and interactions, aligning protein-protein interaction (PPI) networks of different species and detecting conserved modules within these networks.

Results: Here, we present a novel diffusion-based framework that builds on the Rankprop algorithm for protein orthology detection and enhances it in several important ways.

View Article and Find Full Text PDF

Protein ranking by semi-supervised network propagation.

BMC Bioinformatics

March 2006

NEC LABS AMERICA, 4 Independence Way, Princeton, NJ, USA.

Background: Biologists regularly search DNA or protein databases for sequences that share an evolutionary or functional relationship with a given query sequence. Traditional search methods, such as BLAST and PSI-BLAST, focus on detecting statistically significant pairwise sequence alignments and often miss more subtle sequence similarity. Recent work in the machine learning community has shown that exploiting the global structure of the network defined by these pairwise similarities can help detect more remote relationships than a purely local measure.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!