The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model.
View Article and Find Full Text PDFThe cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells.
View Article and Find Full Text PDFGene therapies have the potential to treat disease by delivering therapeutic genetic cargo to disease-associated cells. One limitation to their widespread use is the lack of short regulatory sequences, or promoters, that differentially induce the expression of delivered genetic cargo in target cells, minimizing side effects in other cell types. Such cell-type-specific promoters are difficult to discover using existing methods, requiring either manual curation or access to large datasets of promoter-driven expression from both targeted and untargeted cells.
View Article and Find Full Text PDFInsertion sequence (IS) elements are the simplest autonomous transposable elements found in prokaryotic genomes. We recently discovered that IS110 family elements encode a recombinase and a non-coding bridge RNA (bRNA) that confers modular specificity for target DNA and donor DNA through two programmable loops. Here we report the cryo-electron microscopy structures of the IS110 recombinase in complex with its bRNA, target DNA and donor DNA in three different stages of the recombination reaction cycle.
View Article and Find Full Text PDFGenomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes that are involved in fundamental DNA repair processes, such as homologous recombination, or in the transposition of foreign genetic material by viruses and mobile genetic elements. Here we report that IS110 insertion sequences, a family of minimal and autonomous mobile genetic elements, express a structured non-coding RNA that binds specifically to their encoded recombinase.
View Article and Find Full Text PDFGenomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions, or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes involved in fundamental DNA repair processes such as homologous recombination or in the transposition of foreign genetic material by viruses and mobile genetic elements (MGEs). We report that IS110 insertion sequences, a family of minimal and autonomous MGEs, express a structured non-coding RNA that binds specifically to their encoded recombinase.
View Article and Find Full Text PDFEffective and precise mammalian transcriptome engineering technologies are needed to accelerate biological discovery and RNA therapeutics. Despite the promise of programmable CRISPR-Cas13 ribonucleases, their utility has been hampered by an incomplete understanding of guide RNA design rules and cellular toxicity resulting from off-target or collateral RNA cleavage. Here, we quantified the performance of over 127,000 RfxCas13d (CasRx) guide RNAs and systematically evaluated seven machine learning models to build a guide efficiency prediction algorithm orthogonally validated across multiple human cell types.
View Article and Find Full Text PDFIdentification of host determinants of coronavirus infection informs mechanisms of pathogenesis and may provide novel therapeutic targets. Here, we demonstrate that the histone demethylase KDM6A promotes infection of diverse coronaviruses, including SARS-CoV, SARS-CoV-2, MERS-CoV and mouse hepatitis virus (MHV) in a demethylase activity-independent manner. Mechanistic studies reveal that KDM6A promotes viral entry by regulating expression of multiple coronavirus receptors, including ACE2, DPP4 and Ceacam1.
View Article and Find Full Text PDFIdentifying host genes essential for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has the potential to reveal novel drug targets and further our understanding of Coronavirus Disease 2019 (COVID-19). We previously performed a genome-wide CRISPR/Cas9 screen to identify proviral host factors for highly pathogenic human coronaviruses. Few host factors were required by diverse coronaviruses across multiple cell types, but DYRK1A was one such exception.
View Article and Find Full Text PDFThe ability to deliver genetic cargo to human cells is enabling rapid progress in molecular medicine, but designing this cargo for precise expression in specific cell types is a major challenge. Expression is driven by regulatory DNA sequences within short synthetic promoters, but relatively few of these promoters are cell-type-specific. The ability to design cell-type-specific promoters using model-based optimization would be impactful for research and therapeutic applications.
View Article and Find Full Text PDFLarge serine recombinases (LSRs) are DNA integrases that facilitate the site-specific integration of mobile genetic elements into bacterial genomes. Only a few LSRs, such as Bxb1 and PhiC31, have been characterized to date, with limited efficiency as tools for DNA integration in human cells. In this study, we developed a computational approach to identify thousands of LSRs and their DNA attachment sites, expanding known LSR diversity by >100-fold and enabling the prediction of their insertion site specificities.
View Article and Find Full Text PDFRapid nucleic acid testing is central to infectious disease surveillance. Here, we report an assay for rapid COVID-19 testing and its implementation in a prototype microfluidic device. The assay, which we named DISCoVER (for diagnostics with coronavirus enzymatic reporting), involves extraction-free sample lysis via shelf-stable and low-cost reagents, multiplexed isothermal RNA amplification followed by T7 transcription, and Cas13-mediated cleavage of a quenched fluorophore.
View Article and Find Full Text PDFDirect, amplification-free detection of RNA has the potential to transform molecular diagnostics by enabling simple on-site analysis of human or environmental samples. CRISPR-Cas nucleases offer programmable RNA-guided RNA recognition that triggers cleavage and release of a fluorescent reporter molecule, but long reaction times hamper their detection sensitivity and speed. Here, we show that unrelated CRISPR nucleases can be deployed in tandem to provide both direct RNA sensing and rapid signal generation, thus enabling robust detection of ~30 molecules per µl of RNA in 20 min.
View Article and Find Full Text PDFDirect, amplification-free detection of RNA has the potential to transform molecular diagnostics by enabling simple on-site analysis of human or environmental samples. CRISPR-Cas nucleases offer programmable RNA-guided recognition of RNA that triggers cleavage and release of a fluorescent reporter molecule, but long reaction times hamper sensitivity and speed when applied to point-of-care testing. Here we show that unrelated CRISPR nucleases can be deployed in tandem to provide both direct RNA sensing and rapid signal generation, thus enabling robust detection of ~30 RNA copies/microliter in 20 minutes.
View Article and Find Full Text PDFRapid nucleic acid testing is a critical component of a robust infrastructure for increased disease surveillance. Here, we report a microfluidic platform for point-of-care, CRISPR-based molecular diagnostics. We first developed a nucleic acid test which pairs distinct mechanisms of DNA and RNA amplification optimized for high sensitivity and rapid kinetics, linked to Cas13 detection for specificity.
View Article and Find Full Text PDFCRISPR-Cas genome editing technologies have revolutionized the fields of functional genetics and genome engineering, but with the recent discovery and optimization of RNA-targeting Cas ribonucleases, we may soon see a similar revolution in the study of RNA function and transcriptome engineering. However, to date, successful proof of principle for Cas ribonuclease RNA targeting in eukaryotic systems has been limited. Only recently has successful modification of RNA expression by a Cas ribonuclease been demonstrated in animal embryos.
View Article and Find Full Text PDFBackground: Serological tests are crucial tools for assessments of SARS-CoV-2 exposure, infection and potential immunity. Their appropriate use and interpretation require accurate assay performance data.
Method: We conducted an evaluation of 10 lateral flow assays (LFAs) and two ELISAs to detect anti-SARS-CoV-2 antibodies.
CRISPR-Cas endonucleases directed against foreign nucleic acids mediate prokaryotic adaptive immunity and have been tailored for broad genetic engineering applications. Type VI-D CRISPR systems contain the smallest known family of single effector Cas enzymes, and their signature Cas13d ribonuclease employs guide RNAs to cleave matching target RNAs. To understand the molecular basis for Cas13d function and explain its compact molecular architecture, we resolved cryoelectron microscopy structures of Cas13d-guide RNA binary complex and Cas13d-guide-target RNA ternary complex to 3.
View Article and Find Full Text PDF