Boosting Granular Support Vector Machines for the Accurate Prediction of Protein-Nucleotide Binding Sites.

Yi-Heng Zhu Jun Hu Yong Qi Xiao-Ning Song Dong-Jun Yu

Comb Chem High Throughput Screen

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

Published: September 2020

Accurate identification of protein-ligand binding sites is crucial for understanding protein functions and aiding drug development, but current machine-learning methods struggle due to class imbalance between binding and non-binding residues.
This study introduces Boosting Multiple Granular Support Vector Machines (BGSVM), which specifically addresses class imbalance by training the model with select samples from both binding and non-binding categories.
The new BGSVM-NUC predictor shows improved performance in predicting protein-nucleotide interactions compared to existing methods and is available for free use online.

Aim And Objective: The accurate identification of protein-ligand binding sites helps elucidate protein function and facilitate the design of new drugs. Machine-learning-based methods have been widely used for the prediction of protein-ligand binding sites. Nevertheless, the severe class imbalance phenomenon, where the number of nonbinding (majority) residues is far greater than that of binding (minority) residues, has a negative impact on the performance of such machine-learning-based predictors.

Materials And Methods: In this study, we aim to relieve the negative impact of class imbalance by Boosting Multiple Granular Support Vector Machines (BGSVM). In BGSVM, each base SVM is trained on a granular training subset consisting of all minority samples and some reasonably selected majority samples. The efficacy of BGSVM for dealing with class imbalance was validated by benchmarking it with several typical imbalance learning algorithms. We further implemented a protein-nucleotide binding site predictor, called BGSVM-NUC, with the BGSVM algorithm.

Results: Rigorous cross-validation and independent validation tests for five types of proteinnucleotide interactions demonstrated that the proposed BGSVM-NUC achieves promising prediction performance and outperforms several popular sequence-based protein-nucleotide binding site predictors. The BGSVM-NUC web server is freely available at http://csbio.njust.edu.cn/bioinf/BGSVM-NUC/ for academic use.

Download full-text PDF	Source
http://dx.doi.org/10.2174/1386207322666190925125524	DOI Listing

Publication Analysis

Top Keywords

protein-nucleotide binding

binding sites

class imbalance

granular support

support vector

vector machines

protein-ligand binding

negative impact

binding site

binding

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!