We describe our winning submission to the first Critical Assessment of Computational Hit-Finding Experiments (CACHE) challenge. In this challenge, 23 participants employed a diverse array of structure-based methods to identify hits to a target with no known ligands. We utilized two methods, pharmacophore search and molecular docking, to identify our initial hit list and compounds for the hit expansion phase.
View Article and Find Full Text PDFDeep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Flow matching is a recently proposed generative modeling framework that has achieved impressive performance on a variety of tasks including those on biomolecular structures. The seminal flow matching framework was developed only for continuous data.
View Article and Find Full Text PDFMolecular interactions between proteins and their ligands are important for drug design. A pharmacophore consists of favorable molecular interactions in a protein binding site and can be utilized for virtual screening. Pharmacophores are easiest to identify from co-crystal structures of a bound protein-ligand complex.
View Article and Find Full Text PDFWe present a novel and interpretable approach for assessing small-molecule binding using context explanation networks. Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of precalculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs inactive compounds for many systems.
View Article and Find Full Text PDFDiffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible.
View Article and Find Full Text PDFThe explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding.
View Article and Find Full Text PDFDeep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation.
View Article and Find Full Text PDFEmbeddings from protein language models (PLM's) capture intricate patterns for protein sequences, enabling more accurate and efficient prediction of protein properties. Incorporating protein structure information as direct input into PLMs results in an improvement on the predictive ability of protein embeddings on downstream tasks. In this work we demonstrate that indirectly infusing structure information into PLMs also leads to performance gains on structure related tasks.
View Article and Find Full Text PDFDeep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information.
View Article and Find Full Text PDFProtein structure predictions from deep learning models like AlphaFold2, despite their remarkable accuracy, are likely insufficient for direct use in downstream tasks like molecular docking. The functionality of such models could be improved with a combination of increased accuracy and physical intuition. We propose a new method to train deep learning protein structure prediction models using molecular dynamics force fields to work toward these goals.
View Article and Find Full Text PDFJ Comput Aided Mol Des
December 2023
Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor.
View Article and Find Full Text PDFWe performed exhaustive torsion sampling on more than 3 million compounds using the GFN2-xTB method and performed a comparison of experimental crystallographic and gas-phase conformers. Many conformer sampling methods derive torsional angle distributions from experimental crystallographic data, limiting the torsion preferences to molecules that must be stable, synthetically accessible, and able to be crystallized. In this work, we evaluate the differences in torsional preferences of experimental crystallographic geometries and gas-phase computed conformers from a broad selection of compounds to determine whether torsional angle distributions obtained from semiempirical methods are suitable priors for conformer sampling.
View Article and Find Full Text PDFLung cancer is the leading cause of cancer deaths worldwide. We found that the cytochrome P450 isoform CYP4F11 is significantly overexpressed in patients with lung squamous cell carcinoma. CYP4F11 is a fatty acid -hydroxylase and catalyzes the production of the lipid mediator 20-hydroxyeicosatetraenoic acid (20-HETE) from arachidonic acid.
View Article and Find Full Text PDFThe success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor-ligand binding affinity data.
View Article and Find Full Text PDFWe present a novel and interpretable approach for predicting small-molecule binding affinities using context explanation networks (CENs). Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of pre-calculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs.
View Article and Find Full Text PDFConformer generation, the assignment of realistic 3D coordinates to a small molecule, is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g.
View Article and Find Full Text PDFMolecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose.
View Article and Find Full Text PDFThe lead optimization phase of drug discovery refines an initial hit molecule for desired properties, especially potency. Synthesis and experimental testing of the small perturbations during this refinement can be quite costly and time-consuming. Relative binding free energy (RBFE, also referred to as ) methods allow the estimation of binding free energy changes after small changes to a ligand scaffold.
View Article and Find Full Text PDFThe goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein-ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site.
View Article and Find Full Text PDFVirtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy.
View Article and Find Full Text PDFDespite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure and can be extended by users to include new protein structures as they are released.
View Article and Find Full Text PDFMachine learning has been increasingly applied to the field of computer-aided drug discovery in recent years, leading to notable advances in binding-affinity prediction, virtual screening, and QSAR. Surprisingly, it is less often applied to lead optimization, the process of identifying chemical fragments that might be added to a known ligand to improve its binding affinity. We here describe a deep convolutional neural network that predicts appropriate fragments given the structure of a receptor/ligand complex.
View Article and Find Full Text PDFMolecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.
View Article and Find Full Text PDFJ Chem Inf Model
June 2021
While accurate prediction of aqueous solubility remains a challenge in drug discovery, machine learning (ML) approaches have become increasingly popular for this task. For instance, in the Second Challenge to Predict Aqueous Solubility (SC2), all groups utilized machine learning methods in their submissions. We present SolTranNet, a molecule attention transformer to predict aqueous solubility from a molecule's SMILES representation.
View Article and Find Full Text PDF