Despite decades of study, large parts of the mammalian metabolome remain unexplored. Mass spectrometry-based metabolomics routinely detects thousands of small molecule-associated peaks within human tissues and biofluids, but typically only a small fraction of these can be identified, and structure elucidation of novel metabolites remains a low-throughput endeavor. Biochemical large language models have transformed the interpretation of DNA, RNA, and protein sequences, but have not yet had a comparable impact on understanding small molecule metabolism. Here, we present an approach that leverages chemical language models to discover previously uncharacterized metabolites. We introduce DeepMet, a chemical language model that learns the latent biosynthetic logic embedded within the structures of known metabolites and exploits this understanding to anticipate the existence of as-of-yet undiscovered metabolites. Prospective chemical synthesis of metabolites predicted to exist by DeepMet directs their targeted discovery. Integrating DeepMet with tandem mass spectrometry (MS/MS) data enables automated metabolite discovery within complex tissues. We harness DeepMet to discover several dozen structurally diverse mammalian metabolites. Our work demonstrates the potential for language models to accelerate the mapping of the metabolome.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601323 | PMC |
http://dx.doi.org/10.1101/2024.11.13.623458 | DOI Listing |
Proc Natl Acad Sci U S A
January 2025
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples.
View Article and Find Full Text PDFJMIR Med Inform
January 2025
Department of Science and Education, Shenzhen Baoan Women's and Children's Hospital, Shenzhen, China.
Background: Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.
View Article and Find Full Text PDFJMIR AI
January 2025
Department of Radiology, Children's National Hospital, Washington, DC, United States.
PLoS Comput Biol
January 2025
Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America.
Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Biomedical and Health Informatics, Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, PA, United States of America.
Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!