How can a single protein domain encode a conformational landscape with multiple stably folded states, and how do those states interconvert? Here, we use real-time and relaxation-dispersion NMR to characterize the conformational landscape of the circadian rhythm protein KaiB from . Unique among known natural metamorphic proteins, this KaiB variant spontaneously interconverts between two monomeric states: the "Ground" and "Fold-switched" (FS) states. KaiB in its FS state interacts with multiple binding partners, including the central KaiC protein, to regulate circadian rhythms.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
November 2024
Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a "categorical Jacobian" calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models.
View Article and Find Full Text PDFHow can a single protein domain encode a conformational landscape with multiple stably-folded states, and how do those states interconvert? Here, we use real-time and relaxation-dispersion NMR to characterize the conformational landscape of the circadian rhythm protein KaiB from . Unique among known natural metamorphic proteins, this KaiB variant spontaneously interconverts between two monomeric states: the "Ground" and "Fold-switched" (FS) state. KaiB in its FS state interacts with multiple binding partners, including the central KaiC protein, to regulate circadian rhythms.
View Article and Find Full Text PDFDesigning single molecules that compute general functions of input molecular partners represents a major unsolved challenge in molecular design. Here, we demonstrate that high-throughput, iterative experimental testing of diverse RNA designs crowdsourced from Eterna yields sensors of increasingly complex functions of input oligonucleotide concentrations. After designing single-input RNA sensors with activation ratios beyond our detection limits, we created logic gates, including challenging XOR and XNOR gates, and sensors that respond to the ratio of two inputs.
View Article and Find Full Text PDFAlphaFold2 (ref. ) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates, and disease-causing point mutations often cause population changes within these substates.
View Article and Find Full Text PDFDespite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments.
View Article and Find Full Text PDFInternet-based scientific communities promise a means to apply distributed, diverse human intelligence toward previously intractable scientific problems. However, current implementations have not allowed communities to propose experiments to test all emerging hypotheses at scale or to modify hypotheses in response to experiments. We report high-throughput methods for molecular characterization of nucleic acids that enable the large-scale video game–based crowdsourcing of RNA sensor design, followed by high-throughput functional characterization.
View Article and Find Full Text PDFTherapeutic mRNAs and vaccines are being developed for a broad range of human diseases, including COVID-19. However, their optimization is hindered by mRNA instability and inefficient protein expression. Here, we describe design principles that overcome these barriers.
View Article and Find Full Text PDFMessenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics.
View Article and Find Full Text PDFRNA hydrolysis presents problems in manufacturing, long-term storage, world-wide delivery and in vivo stability of messenger RNA (mRNA)-based vaccines and therapeutics. A largely unexplored strategy to reduce mRNA hydrolysis is to redesign RNAs to form double-stranded regions, which are protected from in-line cleavage and enzymatic degradation, while coding for the same proteins. The amount of stabilization that this strategy can deliver and the most effective algorithmic approach to achieve stabilization remain poorly understood.
View Article and Find Full Text PDFTherapeutic mRNAs and vaccines are being developed for a broad range of human diseases, including COVID-19. However, their optimization is hindered by mRNA instability and inefficient protein expression. Here, we describe design principles that overcome these barriers.
View Article and Find Full Text PDFRNA hydrolysis presents problems in manufacturing, long-term storage, world-wide delivery, and in vivo stability of messenger RNA (mRNA)-based vaccines and therapeutics. A largely unexplored strategy to reduce mRNA hydrolysis is to redesign RNAs to form double-stranded regions, which are protected from in-line cleavage and enzymatic degradation, while coding for the same proteins. The amount of stabilization that this strategy can deliver and the most effective algorithmic approach to achieve stabilization remain poorly understood.
View Article and Find Full Text PDFAs the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nt as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak.
View Article and Find Full Text PDFThe residence time of a drug on its target has been suggested as a more pertinent metric of therapeutic efficacy than the traditionally used affinity constant. Here, we introduce junctured-DNA tweezers as a generic platform that enables real-time observation, at the single-molecule level, of biomolecular interactions. This tool corresponds to a double-strand DNA scaffold that can be nanomanipulated and on which proteins of interest can be engrafted thanks to widely used genetic tagging strategies.
View Article and Find Full Text PDFAs deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the time scale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We provide evidence that the VDE framework [Hernández , Phys. Rev.
View Article and Find Full Text PDFOften the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged covariate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of the variational autoencoder (VAE), which is able to compress complex datasets into simpler manifolds.
View Article and Find Full Text PDFJ Chem Theory Comput
April 2018
Variational autoencoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single nonlinear embedding. In this work, we illustrate how this nonlinear latent embedding can be used as a collective variable for enhanced sampling and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning about a model using AMBER99.
View Article and Find Full Text PDFMarkov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models.
View Article and Find Full Text PDFIn the standard DNA brick set-up, distinct 32-nucleotide strands of single-stranded DNA are each designed to bind specifically to four other such molecules. Experimentally, it has been demonstrated that the overall yield is increased if certain bricks which occur on the outer faces of target structures are merged with adjacent bricks. However, it is not well understood by what mechanism such 'boundary bricks' increase the yield, as they likely influence both the nucleation process and the final stability of the target structure.
View Article and Find Full Text PDFAluminum has attracted great attention recently as it has been suggested by several studies to be associated with increased risks for Alzheimer's and Parkinson's disease. The toxicity of the trivalent ion is assumed to derive from structural changes induced in lipid bilayers upon binding, though the mechanism of this process is still not well understood. In the present study we elucidate the effect of Al(3+) on supported lipid bilayers (SLBs) using fluorescence microscopy, the quartz crystal microbalance with dissipation (QCM-D) technique, dual-polarization interferometry (DPI), and molecular dynamics (MD) simulations.
View Article and Find Full Text PDFUnderstanding the kinetics of dye adsorption and desorption on semiconductors is crucial for optimizing the performance of dye-sensitized solar cells (DSSCs). Quartz crystal microbalance with dissipation monitoring (QCM-D) measures adsorbed mass in real time, allowing determination of binding kinetics. In this work, we characterize adsorption of the common RuBipy dye N3 to the native oxide layer of a planar, sputter-coated titanium surface, simulating the TiO2 substrate of a DSSC.
View Article and Find Full Text PDF