Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11118547 | PMC |
http://dx.doi.org/10.1101/2024.05.12.593772 | DOI Listing |
bioRxiv
December 2024
Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
We recently reanalyzed 20 combinatorial mutagenesis datasets using a novel reference-free analysis (RFA) method and showed that high-order epistasis contributes negligibly to protein sequence-function relationships in every case. Dupic, Phillips, and Desai (DPD) commented on a preprint of our work. In our published paper, we addressed all the major issues they raised, but we respond directly to them here.
View Article and Find Full Text PDFMol Biol Evol
November 2024
Laboratory of Genetics, J. F. Crow Institute for the Study of Evolution, Center for Genomic Science Innovation, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI 53726, USA.
Front Mol Biosci
October 2024
Department of Pharmaceutical and Pharmacological Sciences, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium.
Directed evolution is a powerful tool that can bypass gaps in our understanding of the sequence-function relationship of proteins and still isolate variants with desired activities, properties, and substrate specificities. The rise of directed evolution platforms for polymerase engineering has accelerated the isolation of xenobiotic nucleic acid (XNA) synthetases and reverse transcriptases capable of processing a wide array of unnatural XNAs which have numerous therapeutic and biotechnological applications. Still, the current generation of XNA polymerases functions with significantly lower efficiency than the natural counterparts and retains a significant level of DNA polymerase activity which limits their applications.
View Article and Find Full Text PDFCurr Opin Microbiol
December 2024
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. Electronic address:
Bacterial operons often contain intergenic transcription terminators that terminate some, but not all, RNA polymerase molecules. In these operons, the level of terminator readthrough determines downstream gene expression and helps establish protein ratios among co-regulated genes. Despite its critical role in maintaining stoichiometric gene expression, terminator strength remains difficult to predict from DNA sequence.
View Article and Find Full Text PDFbioRxiv
September 2024
Department of Biology, University of Florida, Gainesville, FL, 32611.
Epistasis complicates our understanding of protein sequence-function relationships and impedes our ability to build accurate predictive models for novel genotypes. Although pairwise epistasis has been extensively studied in proteins, the significance of higher-order epistasis for protein sequence-function relationships remains contentious, largely due to challenges in fitting higher-order epistatatic interactions for full-length proteins. Here, we introduce a novel transformer-based approach.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!