There are more ways to synthesize a 100-amino acid (aa) protein (20) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 10, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499273PMC
http://dx.doi.org/10.1038/s41586-024-07966-0DOI Listing

Publication Analysis

Top Keywords

sequence spaces
12
genetic architecture
8
high-dimensional sequence
8
energetic couplings
8
architecture protein
4
protein stability
4
stability ways
4
ways synthesize
4
synthesize 100-amino
4
100-amino acid
4

Similar Publications

Peptide therapeutics, a major class of medicines, have achieved remarkable success across diseases such as diabetes and cancer, with landmark examples such as GLP-1 receptor agonists revolutionizing the treatment of type-2 diabetes and obesity. Despite their success, designing peptides that satisfy multiple conflicting objectives, such as target binding affinity, solubility, and membrane permeability, remains a major challenge. Classical drug development and structure-based design are ineffective for such tasks, as they fail to optimize global functional properties critical for therapeutic efficacy.

View Article and Find Full Text PDF

Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis.

View Article and Find Full Text PDF

Hibernation, an adaptive mechanism to extreme environmental conditions, is prevalent among mammals. Its main characteristics include reduced body temperature and metabolic rate. However, the mechanisms by which hibernating animals re-enter deep sleep during the euthermic phase to sustain hibernation remain poorly understood.

View Article and Find Full Text PDF

Short linear peptide motifs play important roles in cell signaling. They can act as modification sites for enzymes and as recognition sites for peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence.

View Article and Find Full Text PDF

Unlabelled: The reflexive translation of symbols in one chemical language to another defined genetics. Yet, the co-linearity of codons and amino acids is so commonplace an idea that few even ask how it arose. Readout is done by two distinct sets of proteins, called aminoacyl-tRNA synthetases (AARS).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!