Proc Natl Acad Sci U S A
January 2025
Selective pressure acts on the codon use, optimizing multiple, overlapping signals that are only partially understood. We trained AI models to predict codons given their amino acid sequence in the eukaryotes and and the bacteria and to study the extent to which we can learn patterns in naturally occurring codons to improve predictions. We trained our models on a subset of the proteins and evaluated their predictions on large, separate sets of proteins of varying lengths and expression levels.
View Article and Find Full Text PDFProtein space is characterized by extensive recurrence, or "reuse," of parts, suggesting that new proteins and domains can evolve by mixing-and-matching of existing segments. From an evolutionary perspective, for a given combination to persist, the protein segments should presumably not only match geometrically but also dynamically communicate with each other to allow concerted motions that are key to function. Evidence from protein space supports the premise that domains indeed combine in this manner; we explore whether a similar phenomenon can be observed at the sub-domain level.
View Article and Find Full Text PDFAmyloids, protein, and peptide assemblies in various organisms are crucial in physiological and pathological processes. Their intricate structures, however, present significant challenges, limiting our understanding of their functions, regulatory mechanisms, and potential applications in biomedicine and technology. This study evaluated the AlphaFold2 ColabFold method's structure predictions for antimicrobial amyloids, using eight antimicrobial peptides (AMPs), including those with experimentally determined structures and AMPs known for their distinct amyloidogenic morphological features.
View Article and Find Full Text PDFThe emergence of novel proteins, beyond these that can be readily made by duplication and recombination of preexisting domains, is elusive. De novo emergence from random sequences is unlikely because the vast majority of random chains would not even fold, let alone function. An alternative explanation is that novel proteins emerge by duplication and fusion of pre-existing polypeptide segments.
View Article and Find Full Text PDFAccurate protein structure predictors use clusters of homologues, which disregard sequence specific effects. In this issue of Structure, Weißenow and colleagues report a deep learning-based tool, EMBER2, that efficiently predicts the distances in a protein structure from its amino acid sequence only. This approach should enable the analysis of mutation effects.
View Article and Find Full Text PDF