The introduction of AlphaFold 2 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.
View Article and Find Full Text PDFThe AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.
View Article and Find Full Text PDFThe vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data.
View Article and Find Full Text PDFWhile scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses.
View Article and Find Full Text PDFInsulin-like growth factor (IGF) signaling is highly conserved and tightly regulated by proteases including Pregnancy-Associated Plasma Protein A (PAPP-A). PAPP-A and its paralog PAPP-A2 are metalloproteases that mediate IGF bioavailability through cleavage of IGF binding proteins (IGFBPs). Here, we present single-particle cryo-EM structures of the catalytically inactive mutant PAPP-A (E483A) in complex with a peptide from its substrate IGFBP5 (PAPP-A) and also in its substrate-free form, by leveraging the power of AlphaFold to generate a high quality predicted model as a starting template.
View Article and Find Full Text PDFRecognition of promoters in bacterial RNA polymerases (RNAPs) is controlled by sigma subunits. The key sequence motif recognized by the sigma, the -10 promoter element, is located in the non-template strand of the double-stranded DNA molecule ~10 nucleotides upstream of the transcription start site. Here, we explain the mechanism by which the phage AR9 non-virion RNAP (nvRNAP), a bacterial RNAP homolog, recognizes the -10 element of its deoxyuridine-containing promoter in the template strand.
View Article and Find Full Text PDFGlycoprotein 2 (GP2) and uromodulin (UMOD) filaments protect against gastrointestinal and urinary tract infections by acting as decoys for bacterial fimbrial lectin FimH. By combining AlphaFold2 predictions with X-ray crystallography and cryo-EM, we show that these proteins contain a bipartite decoy module whose new fold presents the high-mannose glycan recognized by FimH. The structure rationalizes UMOD mutations associated with kidney diseases and visualizes a key epitope implicated in cast nephropathy.
View Article and Find Full Text PDFThe AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.
View Article and Find Full Text PDFHow noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays.
View Article and Find Full Text PDFWe describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the "human" category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end-to-end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins.
View Article and Find Full Text PDFProtein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold, at a scale that covers almost the entire human proteome (98.
View Article and Find Full Text PDFProteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure.
View Article and Find Full Text PDFProtein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence. This problem is of fundamental importance as the structure of a protein largely determines its function; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information.
View Article and Find Full Text PDFWe describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network.
View Article and Find Full Text PDFSingle-molecule force spectroscopy has proven extremely beneficial in elucidating folding pathways for membrane proteins. Here, we simulate these measurements, conducting hundreds of unfolding trajectories using our fast Upside algorithm for slow enough speeds to reproduce key experimental features that may be missed using all-atom methods. The speed also enables us to determine the logarithmic dependence of pulling velocities on the rupture levels to better compare to experimental values.
View Article and Find Full Text PDFTo address the large gap between time scales that can be easily reached by molecular simulations and those required to understand protein dynamics, we present a rapid self-consistent approximation of the side chain free energy at every integration step. In analogy with the adiabatic Born-Oppenheimer approximation for electronic structure, the protein backbone dynamics are simulated as preceding according to the dictates of the free energy of an instantaneously-equilibrated side chain potential. The side chain free energy is computed on the fly, allowing the protein backbone dynamics to traverse a greatly smoothed energetic landscape.
View Article and Find Full Text PDFAn ongoing challenge in protein chemistry is to identify the underlying interaction energies that capture protein dynamics. The traditional trade-off in biomolecular simulation between accuracy and computational efficiency is predicated on the assumption that detailed force fields are typically well-parameterized, obtaining a significant fraction of possible accuracy. We re-examine this trade-off in the more realistic regime in which parameterization is a greater source of error than the level of detail in the force field.
View Article and Find Full Text PDFWe use the statistics of a large and curated training set of transmembrane helical proteins to develop a knowledge-based potential that accounts for the dependence on both the depth of burial of the protein in the membrane and the degree of side-chain exposure. Additionally, the statistical potential includes depth-dependent energies for unsatisfied backbone hydrogen bond donors and acceptors, which are found to be relatively small, ∼2 RT. Our potential accurately places known proteins within the bilayer.
View Article and Find Full Text PDFBest claim that we provide no convincing basis to assert that a discrepancy remains between FRET and SAXS results on the dimensions of disordered proteins under physiological conditions. We maintain that a clear discrepancy is apparent in our and other recent publications, including results shown in the Best comment. A plausible origin is fluorophore interactions in FRET experiments.
View Article and Find Full Text PDFA substantial fraction of the proteome is intrinsically disordered, and even well-folded proteins adopt non-native geometries during synthesis, folding, transport, and turnover. Characterization of intrinsically disordered proteins (IDPs) is challenging, in part because of a lack of accurate physical models and the difficulty of interpreting experimental results. We have developed a general method to extract the dimensions and solvent quality (self-interactions) of IDPs from a single small-angle x-ray scattering measurement.
View Article and Find Full Text PDF