Tuning the precision of predictors to reduce overestimation of protein disorder over large datasets.

J Bioinform Comput Biol

Physics Department, Sapienza University of Rome, Rome, Italy.

Published: April 2013

This is a study on the precision of four known protein disorder predictors, ranked among the best-performing ones: DISOPRED2, PONDR VSL2B, IUPred and ESpritz. We address here the problem of a systematic overestimation of the number of disordered proteins recognized through the use of these predictors, considered as a standard. Some of these predictors, used with their default setting, have a low precision, implying a tendency to overestimate the occurrence of disordered proteins in genome-wide surveys. Moreover, different predictors often disagree on the evaluation of individual proteins. To cope with this problem and in order to propose a simple procedure that enhances precision based on precision-recall curves, we re-tuned the discriminative thresholds of the predictors by training and cross-validating their performance on a cured dataset. After re-tuning, both the disagreement among predictors and the tendency to overestimate the occurrence of disordered proteins are reduced. This is shown in a dedicated study over the human proteome and a set of cancer-related human proteins, with no a priori disorder annotation. Simple quantitative estimates suggest that the occurrence of disorder among cancer-related proteins and other similar large-scale surveys has been overestimated in the past.

Download full-text PDF	Source
http://dx.doi.org/10.1142/S0219720012500230	DOI Listing

Publication Analysis

Top Keywords

disordered proteins

protein disorder

tendency overestimate

overestimate occurrence

occurrence disordered

predictors

proteins

tuning precision

precision predictors

predictors reduce

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!