This is a study on the precision of four known protein disorder predictors, ranked among the best-performing ones: DISOPRED2, PONDR VSL2B, IUPred and ESpritz. We address here the problem of a systematic overestimation of the number of disordered proteins recognized through the use of these predictors, considered as a standard. Some of these predictors, used with their default setting, have a low precision, implying a tendency to overestimate the occurrence of disordered proteins in genome-wide surveys. Moreover, different predictors often disagree on the evaluation of individual proteins. To cope with this problem and in order to propose a simple procedure that enhances precision based on precision-recall curves, we re-tuned the discriminative thresholds of the predictors by training and cross-validating their performance on a cured dataset. After re-tuning, both the disagreement among predictors and the tendency to overestimate the occurrence of disordered proteins are reduced. This is shown in a dedicated study over the human proteome and a set of cancer-related human proteins, with no a priori disorder annotation. Simple quantitative estimates suggest that the occurrence of disorder among cancer-related proteins and other similar large-scale surveys has been overestimated in the past.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1142/S0219720012500230 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!