The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508770PMC
http://dx.doi.org/10.1101/2023.09.04.556234DOI Listing

Publication Analysis

Top Keywords

predicting binding
8
binding affinities
8
smiles strings
8
data points
8
training set
8
test set
8
model
6
models
5
poor generalization
4
generalization current
4

Similar Publications

PHIStruct: Improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.

Bioinformatics

January 2025

Bioinformatics Lab, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, 1004, Philippines.

Motivation: Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.

Results: We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera.

View Article and Find Full Text PDF

Motivation: Predicting RNA-binding proteins (RBPs) is central to understanding post-transcriptional regulatory mechanisms. Here, we introduce EnrichRBP, an automated and interpretable computational platform specifically designed for the comprehensive analysis of RBP interactions with RNA.

Results: EnrichRBP is a web service that enables researchers to develop original deep learning and machine learning architectures to explore the complex dynamics of RNA-binding proteins.

View Article and Find Full Text PDF

Ligand Reorganization for End-Point Binding Free Energy Calculations: Identifying Preferred Poses of Fentanyls in the μ Opioid Receptor.

J Chem Theory Comput

January 2025

Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse - Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States.

We have developed a method that uses energy landscapes of unbound and bound ligands to compute reorganization free energies for end-point binding free-energy calculations. The method is applied to our previous simulations of fentanyl derivatives bound to the μ opioid receptor in different orientations. Whereas the mean interaction energy provides an ambiguous ranking of binding poses, interaction entropy and ligand reorganization strongly penalize geometric decoys such that native poses observed in CryoEM structures are best ranked.

View Article and Find Full Text PDF

Exploring the potential of compound-protein complex structure-free models in virtual screening using BlendNet.

Brief Bioinform

November 2024

Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.

Identifying new compounds that interact with a target is a crucial time-limiting step in the initial phases of drug discovery. Compound-protein complex structure-based affinity prediction models can expedite this process; however, their dependence on high-quality three-dimensional (3D) complex structures limits their practical application. Prediction models that do not require 3D complex structures for binding-affinity estimation offer a theoretically attractive alternative; however, accurately predicting affinity without interaction information presents significant challenges.

View Article and Find Full Text PDF

The Evolving T Cell Receptor Recognition Code: The Rules Are More Like Guidelines.

Immunol Rev

January 2025

Department of Chemistry and Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, Notre Dame, Indiana, USA.

αβ T cell receptor (TCR) recognition of peptide-MHC complexes lies at the core of adaptive immunity, balancing specificity and cross-reactivity to facilitate effective antigen discrimination. Early structural studies established basic frameworks helpful for understanding and contextualizing TCR recognition and features such as peptide specificity and MHC restriction. However, the growing TCR structural database and studies launched from structural work continue to reveal exceptions to common assumptions and simplifications derived from earlier work.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!