Modeling the conformational heterogeneity of protein-small molecule systems is an outstanding challenge. We reasoned that while residue level descriptions of biomolecules are efficient for de novo structure prediction, for probing heterogeneity of interactions with small molecules in the folded state an entirely atomic level description could have advantages in speed and generality. We developed a graph neural network called ChemNet trained to recapitulate correct atomic positions from partially corrupted input structures from the Cambridge Structural Database and the Protein Data Bank; the nodes of the graph are the atoms in the system.
View Article and Find Full Text PDFJ Chem Theory Comput
April 2024
Mapping the ensemble of protein conformations that contribute to function and can be targeted by small molecule drugs remains an outstanding challenge. Here, we explore the use of variational autoencoders for reducing the challenge of dimensionality in the protein structure ensemble generation problem. We convert high-dimensional protein structural data into a continuous, low-dimensional representation, carry out a search in this space guided by a structure quality metric, and then use RoseTTAFold guided by the sampled structural information to generate 3D structures.
View Article and Find Full Text PDFDeep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules.
View Article and Find Full Text PDFDespite transformative advances in protein design with deep learning, the design of small-molecule-binding proteins and sensors for arbitrary ligands remains a grand challenge. Here we combine deep learning and physics-based methods to generate a family of proteins with diverse and designable pocket geometries, which we employ to computationally design binders for six chemically and structurally distinct small-molecule targets. Biophysical characterization of the designed binders revealed nanomolar to low micromolar binding affinities and atomic-level design accuracy.
View Article and Find Full Text PDFSequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets.
View Article and Find Full Text PDFDe novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine and 2-deoxycoelenterazine.
View Article and Find Full Text PDFComput Struct Biotechnol J
December 2022
While deep learning (DL) has brought a revolution in the protein structure prediction field, still an important question remains how the revolution can be transferred to advances in structure-based drug discovery. Because the lessons from the recent GPCR dock challenge were inconclusive primarily due to the size of the dataset, in this work we further elaborated on 70 diverse GPCR complexes bound to either small molecules or peptides to investigate the best-practice modeling and docking strategies for GPCR drug discovery. From our quantitative analysis, it is shown that substantial improvements in docking and virtual screening have been possible by the advance in DL-based protein structure predictions with respect to the expected results from the combination of best pre-DL tools.
View Article and Find Full Text PDFDeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure.
View Article and Find Full Text PDFCurr Protoc Protein Sci
December 2020
While native proteins cover diverse structural spaces and achieve various biological events, not many of them can directly serve human needs. One reason is that the native proteins usually contain idiosyncrasies evolved for their native functions but disfavoring engineering requirements. To overcome this issue, one strategy is to create de novo proteins which are designed to possess improved stability, high environmental tolerance, and enhanced engineering potential.
View Article and Find Full Text PDFBecause proteins generally fold to their lowest free energy states, energy-guided refinement in principle should be able to systematically improve the quality of protein structure models generated using homologous structure or co-evolution derived information. However, because of the high dimensionality of the search space, there are far more ways to degrade the quality of a near native model than to improve it, and hence, refinement methods are very sensitive to energy function errors. In the 13th Critial Assessment of techniques for protein Structure Prediction (CASP13), we sought to carry out a thorough search for low energy states in the neighborhood of a starting model using restraints to avoid straying too far.
View Article and Find Full Text PDFThe 3D structure of a protein can be predicted from its amino acid sequence with high accuracy for a large fraction of cases because of the availability of large quantities of experimental data and the advance of computational algorithms. Recently, deep learning methods exploiting the coevolution information obtained by comparing related protein sequences have been successfully used to generate highly accurate model structures even in the absence of template structure information. However, structures predicted based on either template structures or related sequences require further improvement in regions for which information is missing.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2018
Wnt signaling is initiated by Wnt ligand binding to the extracellular ligand binding domain, called the cysteine-rich domain (CRD), of a Frizzled (Fzd) receptor. Norrin, an atypical Fzd ligand, specifically interacts with Fzd4 to activate β-catenin-dependent canonical Wnt signaling. Much of the molecular basis that confers Norrin selectivity in binding to Fzd4 was revealed through the structural study of the Fzd4-Norrin complex.
View Article and Find Full Text PDFThe second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed.
View Article and Find Full Text PDFAdvances in protein model refinement techniques are required as diverse sources of protein structure information are available from low-resolution experiments or informatics-based computations such as cryo-EM, NMR, homology models, or predicted residue contacts. Given semi-reliable or incomplete structural information, structure quality of a protein model has to be improved by ab initio methods such as energy-based simulation. In this study, we describe a new automatic refinement server method designed to improve locally inaccurate regions and overall structure simultaneously.
View Article and Find Full Text PDFStable tissue integrity during embryonic development relies on the function of the cadherin·catenin complex (CCC). The CCC is a useful paradigm for analyzing requirements for specific interactions among the core components of the CCC, and it provides a unique opportunity to examine evolutionarily conserved mechanisms that govern the interaction between α- and β-catenin. HMP-1, unlike its mammalian homolog α-catenin, is constitutively monomeric, and its binding affinity for HMP-2/β-catenin is higher than that of α-catenin for β-catenin.
View Article and Find Full Text PDFMany proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences.
View Article and Find Full Text PDFG-protein-coupled receptors (GPCRs) play important physiological roles related to signal transduction and form a major group of drug targets. Prediction of GPCR-ligand complex structures has therefore important implications to drug discovery. With previously available servers, it was only possible to first predict GPCR structures by homology modeling and then perform ligand docking on the model structures.
View Article and Find Full Text PDFWe present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank.
View Article and Find Full Text PDFWe analyze the results of the GalaxyDock protein-ligand docking program in the two recent experiments of Community Structure-Activity Resource (CSAR), CSAR 2013 and 2014. GalaxyDock performs global optimization of a modified AutoDock3 energy function by employing the conformational space annealing method. The energy function of GalaxyDock is quite sensitive to atomic clashes.
View Article and Find Full Text PDFProtein structures predicted by state-of-the-art template-based methods may still have errors when the template proteins are not similar enough to the target protein. Overall target structure may deviate from the template structures owing to differences in sequences. Structural information for some local regions such as loops may not be available when there are sequence insertions or deletions.
View Article and Find Full Text PDF