Protein p Prediction by Tree-Based Machine Learning.

J Chem Theory Comput

Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.

Published: April 2022

Protonation states of ionizable protein residues modulate many essential biological processes. For correct modeling and understanding of these processes, it is crucial to accurately determine their p values. Here, we present four tree-based machine learning models for protein p prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and p datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical p prediction tool PROPKA and 15% better than the published result from the p prediction method DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu, His, and Lys only. We provide p predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted p values close to the physiological pH.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10510853PMC
http://dx.doi.org/10.1021/acs.jctc.1c01257DOI Listing

Publication Analysis

Top Keywords

machine learning
12
protein prediction
8
tree-based machine
8
gradient boosting
8
asp glu
8
glu lys
8
protein
4
prediction tree-based
4
machine
4
learning protonation
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!