The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946179 | PMC |
http://dx.doi.org/10.1371/journal.pcbi.1008736 | DOI Listing |
ACS Synth Biol
December 2023
Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States.
Deep generative models (DGMs) have shown great success in the understanding and data-driven design of proteins. Variational autoencoders (VAEs) are a popular DGM approach that can learn the correlated patterns of amino acid mutations within a multiple sequence alignment (MSA) of protein sequences and distill this information into a low-dimensional latent space to expose phylogenetic and functional relationships and guide generative protein design. Autoregressive (AR) models are another popular DGM approach that typically lacks a low-dimensional latent embedding but does not require training sequences to be aligned into an MSA and enable the design of variable length proteins.
View Article and Find Full Text PDFEntropy (Basel)
October 2023
Data Science Institute, Reichman University, Herzliya 4610101, Israel.
Variational inference provides a way to approximate probability densities through optimization. It does so by optimizing an upper or a lower bound of the likelihood of the observed data (the evidence). The classic variational inference approach suggests maximizing the Evidence Lower Bound (ELBO).
View Article and Find Full Text PDFComput Struct Biotechnol J
November 2022
Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany.
The process of designing biomolecules, in particular proteins, is witnessing a rapid change in available tooling and approaches, moving from design through physicochemical force fields, to producing plausible, complex sequences fast via end-to-end differentiable statistical models. To achieve conditional and controllable protein design, researchers at the interface of artificial intelligence and biology leverage advances in natural language processing (NLP) and computer vision techniques, coupled with advances in computing hardware to learn patterns from growing biological databases, curated annotations thereof, or both. Once learned, these patterns can be used to provide novel insights into mechanistic biology and the design of biomolecules.
View Article and Find Full Text PDFPLoS Comput Biol
February 2021
Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris, France.
The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase.
View Article and Find Full Text PDFChilds Nerv Syst
August 2015
Department of Neurosurgery, Treviso Hospital, University of Padova, Piazza Ospedale 1, 31100, Treviso, Italy.
Purpose: Although the utility of the sitting position is undisputed for biomechanical and ergonomic reasons, it has been debated in recent years for its risks, particularly venous air embolism (VAE). In order to reduce the hemodynamic effect of VAE, we changed the composition of the surgical field air partially replacing nitrogen with carbon dioxide (CO2) that better dissolves in human tissues.
Methods: First, we tested our method on a test dummy in the sitting position.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!