Large language models improve annotation of prokaryotic viral proteins.

Nat Microbiol

Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA.

Published: February 2024

Viral genomes are poorly annotated in metagenomic samples, representing an obstacle to understanding viral diversity and function. Current annotation approaches rely on alignment-based sequence homology methods, which are limited by the paucity of characterized viral proteins and divergence among viral sequences. Here we show that protein language models can capture prokaryotic viral protein function, enabling new portions of viral sequence space to be assigned biologically meaningful labels. When applied to global ocean virome data, our classifier expanded the annotated fraction of viral protein families by 29%. Among previously unannotated sequences, we highlight the identification of an integrase defining a mobile element in marine picocyanobacteria and a capsid protein that anchors globally widespread viral elements. Furthermore, improved high-level functional annotation provides a means to characterize similarities in genomic organization among diverse viral sequences. Protein language models thus enhance remote homology detection of viral proteins, serving as a useful complement to existing approaches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11311208PMC
http://dx.doi.org/10.1038/s41564-023-01584-8DOI Listing

Publication Analysis

Top Keywords

language models
12
viral proteins
12
viral
11
prokaryotic viral
8
viral sequences
8
sequences protein
8
protein language
8
viral protein
8
protein
5
large language
4

Similar Publications

Background And Objectives: Recent advances in multimodal large language models (MLLMs) have shown promise in medical image interpretation, yet their utility in surgical contexts remains unexplored. This study evaluates six MLLMs' performance in interpreting diverse imaging modalities for laryngeal cancer surgery.

Methods: We analyzed 169 images (X-rays, CT scans, laryngoscopy, and pathology findings) from 50 patients using six state-of-the-art MLLMs.

View Article and Find Full Text PDF

Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy.

View Article and Find Full Text PDF

Objectives: An improvement in speech perception is a major well-documented benefit of cochlear implantation (CI), which is commonly discussed with CI candidates to set expectations. However, a large variability exists in speech perception outcomes. We evaluated the accuracy of clinical predictions of post-CI speech perception scores.

View Article and Find Full Text PDF

Global burden and trends of severe periodontitis among women of childbearing age, 1990-2021.

J Periodontol

January 2025

Department of Stomatology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China.

Background: The global burden and trend of severe periodontitis, as well as its association with sociodemographic development, among women of childbearing age (WCBA) have been unclear so far. This study aims to assess the epidemiological pattern of severe periodontitis in WCBA from 1990 to 2021 and provide projections through 2040.

Methods: Data on the incidence, prevalence, and disability-adjusted life years (DALYs) of severe periodontitis among WCBA from 1990 to 2021 were retrieved from the Global Burden of Disease (GBD) study 2021.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!