Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

Database (Oxford)

Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.

Published: July 2022

The identification of chemicals in articles has attracted a large interest in the biomedical scientific community, given its importance in drug development research. Most of previous research have focused on PubMed abstracts, and further investigation using full-text documents is required because these contain additional valuable information that must be explored. The manual expert task of indexing Medical Subject Headings (MeSH) terms to these articles later helps researchers find the most relevant publications for their ongoing work. The BioCreative VII NLM-Chem track fostered the development of systems for chemical identification and indexing in PubMed full-text articles. Chemical identification consisted in identifying the chemical mentions and linking these to unique MeSH identifiers. This manuscript describes our participation system and the post-challenge improvements we made. We propose a three-stage pipeline that individually performs chemical mention detection, entity normalization and indexing. Regarding chemical identification, we adopted a deep-learning solution that utilizes the PubMedBERT contextualized embeddings followed by a multilayer perceptron and a conditional random field tagging layer. For the normalization approach, we use a sieve-based dictionary filtering followed by a deep-learning similarity search strategy. Finally, for the indexing we developed rules for identifying the more relevant MeSH codes for each article. During the challenge, our system obtained the best official results in the normalization and indexing tasks despite the lower performance in the chemical mention recognition task. In a post-contest phase we boosted our results by improving our named entity recognition model with additional techniques. The final system achieved 0.8731, 0.8275 and 0.4849 in the chemical identification, normalization and indexing tasks, respectively. The code to reproduce our experiments and run the pipeline is publicly available. Database URL https://github.com/bioinformatics-ua/biocreativeVII_track2.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9248917PMC
http://dx.doi.org/10.1093/database/baac047DOI Listing

Publication Analysis

Top Keywords

chemical identification
20
normalization indexing
12
chemical
8
identification indexing
8
indexing pubmed
8
pubmed full-text
8
full-text articles
8
chemical mention
8
indexing tasks
8
indexing
7

Similar Publications

The hypoxia-inducible factor-1 alpha (HIF-1 alpha) is a major regulator of adaptive response to hypoxia, common in patients with severe coronavirus disease 2019 (COVID-19). In addition, HIF-1 alpha regulates the expression of the most important proteins necessary for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection of cells. The study included 129 hospitalized COVID-19 patients.

View Article and Find Full Text PDF

Marine-Derived Compound Targeting mTOR and FGFR-2: A Promising Strategy for Breast, Lung, and Colorectal Cancer Therapy.

Med Chem

January 2025

Integrated Genetics and Molecular Oncology Group, Department of Genetic Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, 603203, India.

Introduction: The marine habitat is a plentiful source of diverse, active compounds that are extensively utilised for their medicinal properties. Pharmaceutical trends have currently changed towards utilising a diverse range of goods derived from the marine environment.

Method: This study aimed to examine the inhibitory effects of bioactive chemicals derived from marine algae and bacteria.

View Article and Find Full Text PDF

Identification of potential MMP-8 inhibitors through virtual screening of natural product databases.

In Silico Pharmacol

January 2025

College of Chemistry and Chemical Engineering, China University of Petroleum, Qingdao, 266580 China.

Matrix metalloproteinase-8 (MMP-8), a type II collagenase, is a key enzyme in the degradation of collagens and is implicated in various pathological processes, making it a promising target for drug discovery. Despite advancements in the development of MMP-8 inhibitors, concerns over potential adverse effects persist. This study aims to address these concerns by focusing on the development of novel compounds with improved safety profiles while maintaining efficacy.

View Article and Find Full Text PDF

Excessive beta oscillations in the subthalamic nucleus are established as a primary electrophysiological biomarker for motor impairment in Parkinson's disease and are currently used as feedback signals in adaptive deep brain stimulation systems. However, there is still a need for optimization of stimulation parameters and the identification of optimal biomarkers that can accommodate varying patient conditions, such as ON and OFF levodopa medication. The precise boundaries of 'pathological' oscillatory ranges, associated with different aspects of motor impairment, are still not fully clarified.

View Article and Find Full Text PDF

The increasing number of contaminants released into the environment necessitates innovative strategies for their detection and identification, particularly in complex environmental matrices like hospital wastewater. Hospital effluents contain both natural and synthetic hormones that might significantly contribute to endocrine disruption in aquatic ecosystems. In this study, HT-EDA has been implemented to identify the main effect-drivers (testosterone, androsterone and norgestrel) from hospital effluent using microplate fractionation, the AR-CALUX bioassay and an efficient data processing workflow.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!