Protein-protein interaction site prediction using random forest proximity distance.

J Bioinform Comput Biol

College of Food and Bioengineering, Henan University of Science and Technology, Luoyang, P. R. China.

Published: February 2021

A front-end method based on random forest proximity distance (PD) is used to screen the test set to improve protein-protein interaction site (PPIS) prediction. The assessment of a distance metric is done under the assumption that a distance definition of higher quality leads to higher classification. On an independent test set, the numerical analysis based on statistical inference shows that the PD has the advantage over Mahalanobis and Cosine distance. Based on the fact that the proximity distance depends on the tree composition of the random forest model, an iterative method is designed to optimize the proximity distance, which adjusts the tree composition of the random forest model by adjusting the size of the training set. Two PD metrics, 75PD and 50PD, are obtained by the iterative method. On two independent test sets, compared with the PD produced by the original training set, the values of 75PD in Matthews correlation coefficient and F score were higher, and the differences between them were statistically significant. All numerical experiments show that the closer the distance between the test data and the training data, the better the prediction results of the predictor. These indicate that the iterative method can optimize proximity distance definition and the distance information provided by PD can be used to indicate the reliability of prediction results.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720020500420DOI Listing

Publication Analysis

Top Keywords

proximity distance
20
random forest
16
iterative method
12
distance
10
protein-protein interaction
8
interaction site
8
forest proximity
8
test set
8
distance definition
8
independent test
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!