Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 10. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41592-024-02504-2DOI Listing

Publication Analysis

Top Keywords

sequence space
12
high-fitness sequence
8
space
7
evoai enables
4
enables extreme
4
extreme compression
4
compression reconstruction
4
reconstruction protein
4
sequence
4
protein sequence
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!