Efficient generative modeling of protein sequences using simple autoregressive models.

Nat Commun

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005, Paris, France.

Published: October 2021

Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 10 and 10). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model's entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 10 possible sequences, which nevertheless constitute only the astronomically small fraction 10 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8490405PMC
http://dx.doi.org/10.1038/s41467-021-25756-4DOI Listing

Publication Analysis

Top Keywords

efficient generative
8
simple autoregressive
8
models
8
autoregressive models
8
generative models
8
generative sequence
8
sequence models
8
sequence space
8
sequence
7
generative modeling
4

Similar Publications

Background: Research demonstrates that Point-of-care ultrasound (POCUS) improves clinical outcomes for patients. Improving clinician satisfaction with POCUS should promote utilization into everyday practice, leading to improved clinical outcomes. Despite this benefit, there are still barriers to use including POCUS workflow.

View Article and Find Full Text PDF

A mixed-method study on physicians' perceptions of pay for performance: impact on professionalism, morality and work-life balance.

BMC Health Serv Res

January 2025

Department of Biostatistics, Ankara University, Faculty of Medicine, Morfoloji Binasi, Biyoistatistik AD, 06230, Ankara, Altindag, Turkey.

Background: Pay-for-performance system (P4P) has been in operation in the Turkish healthcare sector since 2004. While the government defended that it encouraged healthcare professionals' job motivation, and improved patient satisfaction by increasing efficiency and service quality, healthcare professionals have emphasized the system's negative effects on working conditions, physicians' trustworthiness, and cost-quality outcomes. In this study, we investigated physicians' accounts of current working conditions, their status as a moral agent, and their professional attitudes in the context of P4P's perceived effects on their professional, social, private, and future lives.

View Article and Find Full Text PDF

Extensive anthropogenic activity has led to the accumulation of organic and inorganic contaminants in diverse ecosystems, which presents significant challenges for the environment and its inhabitants. Utilizing microalgae as a bioremediation tool can present a potential solution to these challenges. Microalgae have gained significant attention as a promising biotechnological solution for detoxifying environmental pollutants.

View Article and Find Full Text PDF

Glioblastoma multiforme (GBM) is characterized by pronounced immune escape and resistance to chemotherapy-induced apoptosis. Preliminary investigations revealed a marked overexpression of gasdermin E (GSDME) in GBM. Notably, cisplatin (CDDP) demonstrated a capacity of inducing pyroptosis by activating caspase-3 to cleave GSDME, coupled with the release of proinflammatory factors, indicating the potential as a viable approach of inducing anti-tumor immune activation.

View Article and Find Full Text PDF

Background: miRNAs (microRNAs) are endogenous RNAs with lengths of 18 to 24 nucleotides and play critical roles in gene regulation and disease progression. Although traditional wet-lab experiments provide direct evidence for miRNA-disease associations, they are often time-consuming and complicated to analyze by current bioinformatics tools. In recent years, machine learning (ML) and deep learning (DL) techniques are powerful tools to analyze large-scale biological data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!