Language model-guided anticipation and discovery of unknown metabolites.

Despite extensive research, much of the mammalian metabolome is still unexplored, with mass spectrometry detecting many small molecules but revealing few identified metabolites.
A novel approach using DeepMet, a chemical language model, has been developed to uncover previously unknown metabolites by learning the biosynthetic logic of known compounds.
By combining DeepMet with tandem mass spectrometry, the research enables automated discovery of a wide range of metabolites, demonstrating the potential of language models to enhance our understanding of the metabolome.

Despite decades of study, large parts of the mammalian metabolome remain unexplored. Mass spectrometry-based metabolomics routinely detects thousands of small molecule-associated peaks within human tissues and biofluids, but typically only a small fraction of these can be identified, and structure elucidation of novel metabolites remains a low-throughput endeavor. Biochemical large language models have transformed the interpretation of DNA, RNA, and protein sequences, but have not yet had a comparable impact on understanding small molecule metabolism. Here, we present an approach that leverages chemical language models to discover previously uncharacterized metabolites. We introduce DeepMet, a chemical language model that learns the latent biosynthetic logic embedded within the structures of known metabolites and exploits this understanding to anticipate the existence of as-of-yet undiscovered metabolites. Prospective chemical synthesis of metabolites predicted to exist by DeepMet directs their targeted discovery. Integrating DeepMet with tandem mass spectrometry (MS/MS) data enables automated metabolite discovery within complex tissues. We harness DeepMet to discover several dozen structurally diverse mammalian metabolites. Our work demonstrates the potential for language models to accelerate the mapping of the metabolome.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601323	PMC
http://dx.doi.org/10.1101/2024.11.13.623458	DOI Listing

Publication Analysis

Top Keywords

language models

chemical language

metabolites

language

language model-guided

model-guided anticipation

anticipation discovery

discovery unknown

unknown metabolites

metabolites despite

Similar Publications

Learning the language of antibody hypervariability.

Proc Natl Acad Sci U S A

January 2025

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Rohit Singh Chiho Im Yu Qiu Brian Mackness Abhinav Gupta

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.

View Article and Find Full Text PDF

Similar Publications

Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.

JMIR Med Inform

January 2025

Department of Science and Education, Shenzhen Baoan Women's and Children's Hospital, Shenzhen, China.

Shiben Zhu Wanqin Hu Zhi Yang Jiani Yan Fang Zhang

Background: Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.

View Article and Find Full Text PDF

Similar Publications

Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models.

JMIR AI

January 2025

Department of Radiology, Children's National Hospital, Washington, DC, United States.

Nitin Chetla Mihir Tandon Joseph Chang Kunal Sukhija Romil Patel

View Article and Find Full Text PDF

Similar Publications

The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

PLoS Comput Biol

January 2025

Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America.

Ahmed Daoud Asa Ben-Hur

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources.

View Article and Find Full Text PDF

Similar Publications

Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.

PLoS One

January 2025

Department of Biomedical and Health Informatics, Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, PA, United States of America.

Sifei Han Lingyun Shi Fuchiang Rich Tsui

Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!