Stereotact Funct Neurosurg
November 2024
Introduction: One of the challenges in directional deep brain stimulation (DBS) is to determine the orientation of implanted electrodes relative to targeted regions. Post-operative images must be aligned with a model of the implanted lead, usually a computer-based model provided by the manufacturer. This paper shows that models can alternatively be obtained by capturing images of individual leads using micro-CT, a high-resolution CT technique.
View Article and Find Full Text PDFThroughout evolution, protein families undergo substantial sequence divergence while preserving structure and function. Although most mutations are deleterious, evolution can explore sequence space via epistatic networks of intramolecular interactions that alleviate the harmful mutations. However, comprehensive analysis of such epistatic networks across protein families remains limited.
View Article and Find Full Text PDFWe introduce a data-driven epistatic model of protein evolution, capable of generating evolutionary trajectories spanning very different time scales reaching from individual mutations to diverged homologs. Our in silico evolution encompasses random nucleotide mutations, insertions and deletions, and models selection using a fitness landscape, which is inferred via a generative probabilistic model for protein families. We show that the proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments.
View Article and Find Full Text PDFRNA ribozyme (Walter Engelke, Biologist (London, England) 49:199-203, 2002) datasets typically contain from a few hundred to a few thousand naturally occurring sequences. However, the potential sequence space of RNA is huge. For example, the number of possible RNA sequences of length 150 nucleotides is approximately , a figure that far surpasses the estimated number of atoms in the known universe, which is around .
View Article and Find Full Text PDFGenerative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters.
View Article and Find Full Text PDFMetabolic magnetic resonance imaging (MRI) using hyperpolarized (HP) pyruvate is becoming a non-invasive technique for diagnosing, staging, and monitoring response to treatment in cancer and other diseases. The clinically established method for producing HP pyruvate, dissolution dynamic nuclear polarization, however, is rather complex and slow. Signal Amplification By Reversible Exchange (SABRE) is an ultra-fast and low-cost method based on fast chemical exchange.
View Article and Find Full Text PDFMotivation: Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods borrowed from natural language processing (NLP). However, most approaches target single proteins or protein domains, and do not take into account any functional specificity or interaction with the context.
View Article and Find Full Text PDFPredicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species.
View Article and Find Full Text PDFDehydroamino acids are important structural motifs and biosynthetic intermediates for natural products. Many bioactive natural products of nonribosomal origin contain dehydroamino acids; however, the biosynthesis of dehydroamino acids in most nonribosomal peptides is not well understood. Here, we provide biochemical and bioinformatic evidence in support of the role of a unique class of condensation domains in dehydration (C).
View Article and Find Full Text PDFCharacterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2022
The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis.
View Article and Find Full Text PDFDuring their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra.
View Article and Find Full Text PDFBMC Bioinformatics
October 2021
Background: Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain.
View Article and Find Full Text PDFGenerative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 10 and 10).
View Article and Find Full Text PDFBoltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites.
View Article and Find Full Text PDFCoevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences.
View Article and Find Full Text PDFCrabs of the family Camptandriidae are the most dominant burrowing crabs inhabiting arid mangrove forests of the Persian Gulf. They play important roles in the structuring and functioning of mangrove ecosystems by modulating biogeochemical processes and cycling of nutrients, serving as important ecosystem engineers. We analysed stable carbon (C) and nitrogen (N) isotope values of three camptandriid crabs (, and ) and their potential food sources in the Hara Biosphere Reserve, northern Persian Gulf.
View Article and Find Full Text PDFEpigenetics Chromatin
January 2021
Splicing factors have recently been shown to be involved in heterochromatin formation, but their role in controlling heterochromatin structure and function remains poorly understood. In this study, we identified a fission yeast homologue of human splicing factor RBM10, which has been linked to TARP syndrome. Overexpression of Rbm10 in fission yeast leads to strong global intron retention.
View Article and Find Full Text PDFSequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e.
View Article and Find Full Text PDFPredicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages.
View Article and Find Full Text PDFThe rational design of enzymes is an important goal for both fundamental and practical reasons. Here, we describe a process to learn the constraints for specifying proteins purely from evolutionary sequence data, design and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay. For chorismate mutase, a key enzyme in the biosynthesis of aromatic amino acids, we demonstrate the design of natural-like catalytic function with substantial sequence diversity.
View Article and Find Full Text PDF