Designing realistic regulatory DNA with autoregressive language models.

Genome Res

Biology Research|AI Development, gRED Computational Sciences, Genentech, South San Francisco, California 94080, USA;

Published: October 2024

-regulatory elements (CREs), such as promoters and enhancers, are DNA sequences that regulate the expression of genes. The activity of a CRE is influenced by the order, composition, and spacing of sequence motifs that are bound by proteins called transcription factors (TFs). Synthetic CREs with specific properties are needed for biomanufacturing as well as for many therapeutic applications including cell and gene therapy. Here, we present regLM, a framework to design synthetic CREs with desired properties, such as high, low, or cell type-specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. We used our framework to design synthetic yeast promoters and cell type-specific human enhancers. We demonstrate that the synthetic CREs generated by our approach are not only predicted to have the desired functionality but also contain biological features similar to experimentally validated CREs. regLM thus facilitates the design of realistic regulatory DNA elements while providing insights into the -regulatory code.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529870PMC
http://dx.doi.org/10.1101/gr.279142.124DOI Listing

Publication Analysis

Top Keywords

synthetic cres
12
realistic regulatory
8
regulatory dna
8
autoregressive language
8
language models
8
framework design
8
design synthetic
8
cell type-specific
8
cres
5
designing realistic
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!