The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.

J Cheminform

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.

Published: October 2021

AI Article Synopsis

  • Structure-based drug design relies on understanding 3D structures of protein-ligand complexes, but predicting ligand-binding poses remains difficult due to limitations in scoring functions and protein flexibility.
  • This study created XGBoost classifiers to improve accuracy in identifying near-native binding poses using data from the PDBbind database, emphasizing the importance of using Extended Connectivity Interaction Features, Vina energy terms, and docking pose ranks.
  • The research shows that incorporating cross-docked poses in training significantly enhances model performance and provides new datasets and code for future studies in machine learning-based scoring functions for binding pose prediction.

Article Abstract

Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8520186PMC
http://dx.doi.org/10.1186/s13321-021-00560-wDOI Listing

Publication Analysis

Top Keywords

cross-docked poses
16
protein-ligand binding
12
binding pose
8
pose prediction
8
binding poses
8
vina energy
8
energy terms
8
poses
7
binding
6
performance
5

Similar Publications

In recent years, the outbreak of infectious disease caused by Zika Virus (ZIKV) has posed a major threat to global public health, calling for the development of therapeutics to treat ZIKV disease. Several possible druggable targets involved in virus replication have been identified. In search of additional potential inhibitors, we screened 2895 FDA-approved compounds using Non-Structural Protein 5 (NS5) as a target utilizing virtual screening of in-silco methods.

View Article and Find Full Text PDF

Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer.

J Med Chem

August 2022

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.

The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential.

View Article and Find Full Text PDF

The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.

J Cheminform

October 2021

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.

Article Synopsis
  • Structure-based drug design relies on understanding 3D structures of protein-ligand complexes, but predicting ligand-binding poses remains difficult due to limitations in scoring functions and protein flexibility.
  • This study created XGBoost classifiers to improve accuracy in identifying near-native binding poses using data from the PDBbind database, emphasizing the importance of using Extended Connectivity Interaction Features, Vina energy terms, and docking pose ranks.
  • The research shows that incorporating cross-docked poses in training significantly enhances model performance and provides new datasets and code for future studies in machine learning-based scoring functions for binding pose prediction.
View Article and Find Full Text PDF

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard data set of sufficient size to compare performance between models.

View Article and Find Full Text PDF

The binding modes of well known MurD inhibitors have been studied using molecular docking and molecular dynamics (MD) simulations. The docking results of inhibitors 1-30 revealed similar mode of interaction with Escherichia coli-MurD. Further, residues Thr36, Arg37, His183, Lys319, Lys348, Thr321, Ser415 and Phe422 are found to be important for inhibitors and E.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!