How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11390738 | PMC |
http://dx.doi.org/10.1038/s41467-024-51895-5 | DOI Listing |
bioRxiv
December 2024
Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
We recently reanalyzed 20 combinatorial mutagenesis datasets using a novel reference-free analysis (RFA) method and showed that high-order epistasis contributes negligibly to protein sequence-function relationships in every case. Dupic, Phillips, and Desai (DPD) commented on a preprint of our work. In our published paper, we addressed all the major issues they raised, but we respond directly to them here.
View Article and Find Full Text PDFMol Biol Evol
November 2024
Laboratory of Genetics, J. F. Crow Institute for the Study of Evolution, Center for Genomic Science Innovation, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI 53726, USA.
Front Mol Biosci
October 2024
Department of Pharmaceutical and Pharmacological Sciences, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium.
Directed evolution is a powerful tool that can bypass gaps in our understanding of the sequence-function relationship of proteins and still isolate variants with desired activities, properties, and substrate specificities. The rise of directed evolution platforms for polymerase engineering has accelerated the isolation of xenobiotic nucleic acid (XNA) synthetases and reverse transcriptases capable of processing a wide array of unnatural XNAs which have numerous therapeutic and biotechnological applications. Still, the current generation of XNA polymerases functions with significantly lower efficiency than the natural counterparts and retains a significant level of DNA polymerase activity which limits their applications.
View Article and Find Full Text PDFCurr Opin Microbiol
December 2024
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. Electronic address:
Bacterial operons often contain intergenic transcription terminators that terminate some, but not all, RNA polymerase molecules. In these operons, the level of terminator readthrough determines downstream gene expression and helps establish protein ratios among co-regulated genes. Despite its critical role in maintaining stoichiometric gene expression, terminator strength remains difficult to predict from DNA sequence.
View Article and Find Full Text PDFbioRxiv
September 2024
Department of Biology, University of Florida, Gainesville, FL, 32611.
Epistasis complicates our understanding of protein sequence-function relationships and impedes our ability to build accurate predictive models for novel genotypes. Although pairwise epistasis has been extensively studied in proteins, the significance of higher-order epistasis for protein sequence-function relationships remains contentious, largely due to challenges in fitting higher-order epistatatic interactions for full-length proteins. Here, we introduce a novel transformer-based approach.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!