MetaboListem and TABoLiSTM: Two Deep Learning Algorithms for Metabolite Named Entity Recognition.

Metabolites

Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK.

Published: March 2022

Reviewing the metabolomics literature is becoming increasingly difficult because of the rapid expansion of relevant journal literature. Text-mining technologies are therefore needed to facilitate more efficient literature reviews. Here we contribute a standardised corpus of full-text publications from metabolomics studies and describe the development of two metabolite named entity recognition (NER) methods. These methods are based on Bidirectional Long Short-Term Memory (BiLSTM) networks and each incorporate different transfer learning techniques (for tokenisation and word embedding). Our first model (MetaboListem) follows prior methodology using GloVe word embeddings. Our second model exploits BERT and BioBERT for embedding and is named TABoLiSTM (Transformer-Affixed BiLSTM). The methods are trained on a novel corpus annotated using rule-based methods, and evaluated on manually annotated metabolomics articles. MetaboListem (F1-score 0.890, precision 0.892, recall 0.888) and TABoLiSTM (BioBERT version: F1-score 0.909, precision 0.926, recall 0.893) have achieved state-of-the-art performance on metabolite NER. A training corpus with full-text sentences from >1000 full-text Open Access metabolomics publications with 105,335 annotated metabolites was created, as well as a manually annotated test corpus (19,138 annotations). This work demonstrates that deep learning algorithms are capable of identifying metabolite names accurately and efficiently in text. The proposed corpus and NER algorithms can be used for metabolomics text-mining tasks such as information retrieval, document classification and literature-based discovery and are available from the omicsNLP GitHub repository.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9031427PMC
http://dx.doi.org/10.3390/metabo12040276DOI Listing

Publication Analysis

Top Keywords

deep learning
8
learning algorithms
8
metabolite named
8
named entity
8
entity recognition
8
corpus full-text
8
manually annotated
8
metabolomics
5
corpus
5
metabolistem tabolistm
4

Similar Publications

Purpose: Identifying muscles linked to postoperative physical function can guide protocols to enhance early recovery following total hip arthroplasty (THA). This study aimed to evaluate the association of preoperative pelvic and thigh muscle volume and quality with early physical function after THA in patients with unilateral hip osteoarthritis (HOA).

Methods: Preoperative Computed tomography (CT) images of 61 patients (eight males and 53 females) with HOA were analyzed.

View Article and Find Full Text PDF

In this research, a green approach utilizing deep eutectic solvent liquid-liquid microextraction is combined with smartphone digital image colorimetry for the determination of boron in nut samples. A smartphone camera was used to capture the image of the analyte extract located in a custom-made colorimetric box. Using ImageJ software, the images were split into RGB channels, with the green channel identified as the optimum.

View Article and Find Full Text PDF

Highly accurate real-space electron densities with neural networks.

J Chem Phys

January 2025

Microsoft Research AI for Science, 21 Station Road, Cambridge CB1 2FB, United Kingdom.

Variational ab initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows, in principle, straightforward extraction of any other observable of interest, besides the energy, but, in practice, this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation.

View Article and Find Full Text PDF

With the global population aging at an unprecedented rate, there is a need to extend healthy productive life span. This review examines how Deep Learning (DL) and Generative Artificial Intelligence (GenAI) are used in biomarker discovery, deep aging clock development, geroprotector identification and generation of dual-purpose therapeutics targeting aging and disease. The paper explores the emergence of multimodal, multitasking research systems highlighting promising future directions for GenAI in human and animal aging research, as well as clinical application in healthy longevity medicine.

View Article and Find Full Text PDF

Background Detection and segmentation of lung tumors on CT scans are critical for monitoring cancer progression, evaluating treatment responses, and planning radiation therapy; however, manual delineation is labor-intensive and subject to physician variability. Purpose To develop and evaluate an ensemble deep learning model for automating identification and segmentation of lung tumors on CT scans. Materials and Methods A retrospective study was conducted between July 2019 and November 2024 using a large dataset of CT simulation scans and clinical lung tumor segmentations from radiotherapy plans.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!