Proteins have proven to be useful agents in a variety of fields, from serving as potent therapeutics to enabling complex catalysis for chemical manufacture. However, they remain difficult to design and are instead typically selected for using extensive screens or directed evolution. Recent developments in protein large language models have enabled fast generation of diverse protein sequences in unexplored regions of protein space predicted to fold into varied structures, bind relevant targets, and catalyze novel reactions. Nevertheless, we lack methods to characterize these proteins experimentally at scale and update generative models based on those results. We describe Protein CREATE (Computational Redesign via an Experiment-Augmented Training Engine), an integrated computational and experimental pipeline that incorporates an experimental workflow leveraging next generation sequencing and phage display with single-molecule readouts to collect vast amounts of quantitative binding data for updating protein large language models. We use Protein CREATE to generate and assay thousands of designed binders to IL-7 receptor and insulin receptor with parallel positive and negative selections to identify on-target binders. We discover not only individual novel binders but also features of ligand-receptor binding, including preservation of the IL7R - ligand hydrophobic interface specifically and existence of multiple approaches to contact the insulin receptor. We also demonstrate the importance of structural features, such as the lack of unpaired cysteine residues, toward design fidelity and find computational pre-screening metrics, such as interchain predicted TM scoring (iPTM), while useful, are imperfect predictors as they neither guarantee experimental binding nor rule it out. We use the data collected from Protein CREATE to score designs from the initial generative models. Globally, Protein CREATE will power future closed-loop design-build-test cycles to enable fine-grained design of protein binders.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11722223PMC
http://dx.doi.org/10.1101/2024.12.20.629847DOI Listing

Publication Analysis

Top Keywords

protein create
20
protein
11
protein binders
8
protein large
8
large language
8
language models
8
generative models
8
insulin receptor
8
binders
5
create enables
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!