More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained at scale on evolutionary data can generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins.
View Article and Find Full Text PDFRecent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations.
View Article and Find Full Text PDFIn the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology.
View Article and Find Full Text PDFTranscriptional programming of the innate immune response is pivotal for host protection. However, the transcriptional mechanisms that link pathogen sensing with innate activation remain poorly understood. During HIV-1 infection, human dendritic cells (DCs) can detect the virus through an innate sensing pathway, leading to antiviral interferon and DC maturation.
View Article and Find Full Text PDFWe have generalized the derivation of genetic-interaction networks from quantitative phenotype data. Familiar and unfamiliar modes of genetic interaction were identified and defined. A network was derived from agar-invasion phenotypes of mutant yeast.
View Article and Find Full Text PDFWe investigated the organization of interacting proteins and protein complexes into networks of modules. A network-clustering method was developed to identify modules. This method of network-structure determination was validated by clustering known signaling-protein modules and by identifying module rudiments in exclusively high-throughput protein-interaction data with high error frequencies and low coverage.
View Article and Find Full Text PDF