Predicting a ligand's bound pose to a target protein is a key component of early-stage computational drug discovery. Recent developments in machine learning methods have focused on improving pose quality at the cost of model runtime. For high-throughput virtual screening applications, this exposes a capability gap that can be filled by moderately accurate but fast pose prediction.
View Article and Find Full Text PDFAlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space.
View Article and Find Full Text PDFWith advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap.
View Article and Find Full Text PDFWith advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap.
View Article and Find Full Text PDFMultiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins.
View Article and Find Full Text PDFAlphaFold2 has already changed structural biology but its true power may lie in how it changes the way we think about cells and organisms. Two studies broadly analyze and assess the performance of AlphaFold2 to outline the extent of its utility and limitations in providing structural models that shed light on biological questions, including mutations, post-translational modifications, and protein-protein complex interactions.
View Article and Find Full Text PDFAlphaFold2 and related computational systems predict protein structure using deep learning and co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite high prediction accuracy achieved by these systems, challenges remain in (1) prediction of orphan and rapidly evolving proteins for which an MSA cannot be generated; (2) rapid exploration of designed structures; and (3) understanding the rules governing spontaneous polypeptide folding in solution. Here we report development of an end-to-end differentiable recurrent geometric network (RGN) that uses a protein language model (AminoBERT) to learn latent structural information from unaligned proteins.
View Article and Find Full Text PDFActa Crystallogr D Struct Biol
August 2021
The functions of most proteins result from their 3D structures, but determining their structures experimentally remains a challenge, despite steady advances in crystallography, NMR and single-particle cryoEM. Computationally predicting the structure of a protein from its primary sequence has long been a grand challenge in bioinformatics, intimately connected with understanding protein chemistry and dynamics. Recent advances in deep learning, combined with the availability of genomic data for inferring co-evolutionary patterns, provide a new approach to protein structure prediction that is complementary to longstanding physics-based approaches.
View Article and Find Full Text PDF