Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding.
View Article and Find Full Text PDFOver the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules.
View Article and Find Full Text PDFCurrent deep learning methods for structure-based virtual screening take the structures of both the protein and the ligand as input but make little or no use of the protein structure when predicting ligand binding. Here, we show how a relatively simple method of data set augmentation forces such deep learning methods to take into account information from the protein. Models trained in this way are more generalizable (make better predictions on protein/ligand complexes from a different distribution to the training data).
View Article and Find Full Text PDF