Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11152266 | PMC |
http://dx.doi.org/10.1371/journal.pcbi.1012144 | DOI Listing |
Annu Rev Biophys
January 2025
1CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA; email:
Like their prokaryotic counterparts, eukaryotic transcription factors must recognize specific DNA sites, search for them efficiently, and bind to them to help recruit or block the transcription machinery. For eukaryotic factors, however, the genetic signals are extremely complex and scattered over vast, multichromosome genomes, while the DNA interplay occurs in a varying landscape defined by chromatin remodeling events and epigenetic modifications. Eukaryotic factors are rich in intrinsically disordered regions and are also distinct in their recognition of short DNA motifs and utilization of open DNA interaction interfaces as ways to gain access to DNA on nucleosomes.
View Article and Find Full Text PDFAnal Chem
January 2025
School of Molecular and Cellular Biology and Astbury Centre, University of Leeds, Leeds LS2 9JT, U.K.
Hydrogen/deuterium exchange mass spectrometry (HDX-MS) is a powerful technique to interrogate protein structure and dynamics. With the ability to study almost any protein without a size limit, including intrinsically disordered ones, HDX-MS has shown fast growing importance as a complement to structural elucidation techniques. Current experiments compare two or more related conditions (sequences, interaction partners, excipients, conformational states, etc.
View Article and Find Full Text PDFSci Adv
January 2025
Key Laboratory of Plant Carbon Capture, Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
Plants sense and respond to hyperosmotic stress via quick activation of sucrose nonfermenting 1-related protein kinase 2 (SnRK2). Under unstressed conditions, the protein phosphatase type 2C (PP2C) in clade A interact with and inhibit SnRK2s in subgroup III, which are released from the PP2C inhibition via pyrabactin resistance 1-like (PYL) abscisic acid receptors. However, how SnRK2s are released under osmotic stress is unclear.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Computer Science, Xi'an University of Architecture and Technology, Xi'an, 710055, Shaanxi Province, China.
The attention mechanism has significantly progressed in various point cloud tasks. Benefiting from its significant competence in capturing long-range dependencies, research in point cloud completion has achieved promising results. However, the typically disordered point cloud data features complicated non-Euclidean geometric structures and exhibits unpredictable behavior.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
Biomolecular Sciences Institute, Florida International University, Miami, FL 33199, United States.
The mammalian high mobility group protein AT-hook 2 (HMGA2) is a small DNA-binding protein that specifically targets AT-rich DNA sequences. Structurally, HMGA2 is an intrinsically disordered protein (IDP), comprising three positively charged 'AT-hooks' and a negatively charged C-terminus. HMGA2 can form homodimers through electrostatic interactions between its 'AT-hooks' and C-terminus.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!