Transferable deep generative modeling of intrinsically disordered protein conformations.

PLoS Comput Biol

Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America.

Published: May 2024

Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11152266PMC
http://dx.doi.org/10.1371/journal.pcbi.1012144DOI Listing

Publication Analysis

Top Keywords

intrinsically disordered
12
disordered protein
12
structural ensembles
12
deep generative
8
disordered proteins
8
conformational ensembles
8
machine learning
8
diffusion model
8
neural networks
8
training set
8

Similar Publications

Mechanisms for DNA Interplay in Eukaryotic Transcription Factors.

Annu Rev Biophys

January 2025

1CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA; email:

Like their prokaryotic counterparts, eukaryotic transcription factors must recognize specific DNA sites, search for them efficiently, and bind to them to help recruit or block the transcription machinery. For eukaryotic factors, however, the genetic signals are extremely complex and scattered over vast, multichromosome genomes, while the DNA interplay occurs in a varying landscape defined by chromatin remodeling events and epigenetic modifications. Eukaryotic factors are rich in intrinsically disordered regions and are also distinct in their recognition of short DNA motifs and utilization of open DNA interaction interfaces as ways to gain access to DNA on nucleosomes.

View Article and Find Full Text PDF

Hydrogen/deuterium exchange mass spectrometry (HDX-MS) is a powerful technique to interrogate protein structure and dynamics. With the ability to study almost any protein without a size limit, including intrinsically disordered ones, HDX-MS has shown fast growing importance as a complement to structural elucidation techniques. Current experiments compare two or more related conditions (sequences, interaction partners, excipients, conformational states, etc.

View Article and Find Full Text PDF

SnRK2 kinases sense molecular crowding and form condensates to disrupt ABI1 inhibition.

Sci Adv

January 2025

Key Laboratory of Plant Carbon Capture, Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.

Plants sense and respond to hyperosmotic stress via quick activation of sucrose nonfermenting 1-related protein kinase 2 (SnRK2). Under unstressed conditions, the protein phosphatase type 2C (PP2C) in clade A interact with and inhibit SnRK2s in subgroup III, which are released from the PP2C inhibition via pyrabactin resistance 1-like (PYL) abscisic acid receptors. However, how SnRK2s are released under osmotic stress is unclear.

View Article and Find Full Text PDF

The attention mechanism has significantly progressed in various point cloud tasks. Benefiting from its significant competence in capturing long-range dependencies, research in point cloud completion has achieved promising results. However, the typically disordered point cloud data features complicated non-Euclidean geometric structures and exhibits unpredictable behavior.

View Article and Find Full Text PDF

The mammalian high mobility group protein AT-hook 2 (HMGA2) is a small DNA-binding protein that specifically targets AT-rich DNA sequences. Structurally, HMGA2 is an intrinsically disordered protein (IDP), comprising three positively charged 'AT-hooks' and a negatively charged C-terminus. HMGA2 can form homodimers through electrostatic interactions between its 'AT-hooks' and C-terminus.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!