Boosting Granular Support Vector Machines for the Accurate Prediction of Protein-Nucleotide Binding Sites.

Comb Chem High Throughput Screen

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

Published: September 2020

AI Article Synopsis

  • Accurate identification of protein-ligand binding sites is crucial for understanding protein functions and aiding drug development, but current machine-learning methods struggle due to class imbalance between binding and non-binding residues.
  • This study introduces Boosting Multiple Granular Support Vector Machines (BGSVM), which specifically addresses class imbalance by training the model with select samples from both binding and non-binding categories.
  • The new BGSVM-NUC predictor shows improved performance in predicting protein-nucleotide interactions compared to existing methods and is available for free use online.

Article Abstract

Aim And Objective: The accurate identification of protein-ligand binding sites helps elucidate protein function and facilitate the design of new drugs. Machine-learning-based methods have been widely used for the prediction of protein-ligand binding sites. Nevertheless, the severe class imbalance phenomenon, where the number of nonbinding (majority) residues is far greater than that of binding (minority) residues, has a negative impact on the performance of such machine-learning-based predictors.

Materials And Methods: In this study, we aim to relieve the negative impact of class imbalance by Boosting Multiple Granular Support Vector Machines (BGSVM). In BGSVM, each base SVM is trained on a granular training subset consisting of all minority samples and some reasonably selected majority samples. The efficacy of BGSVM for dealing with class imbalance was validated by benchmarking it with several typical imbalance learning algorithms. We further implemented a protein-nucleotide binding site predictor, called BGSVM-NUC, with the BGSVM algorithm.

Results: Rigorous cross-validation and independent validation tests for five types of proteinnucleotide interactions demonstrated that the proposed BGSVM-NUC achieves promising prediction performance and outperforms several popular sequence-based protein-nucleotide binding site predictors. The BGSVM-NUC web server is freely available at http://csbio.njust.edu.cn/bioinf/BGSVM-NUC/ for academic use.

Download full-text PDF

Source
http://dx.doi.org/10.2174/1386207322666190925125524DOI Listing

Publication Analysis

Top Keywords

protein-nucleotide binding
12
binding sites
12
class imbalance
12
granular support
8
support vector
8
vector machines
8
protein-ligand binding
8
negative impact
8
binding site
8
binding
6

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!