Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or "positive") literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work-contributing to the "Literature AI for DILI Challenge" of the Critical Assessment of Massive Data Analysis (CAMDA) 2021- presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine's (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395939PMC
http://dx.doi.org/10.3389/fgene.2022.894209DOI Listing

Publication Analysis

Top Keywords

chemical disease
16
disease embeddings
12
disease similarities
8
drug-induced liver
8
liver injury
8
extended model
8
disease
6
chemical
5
dialogi utilising
4
utilising nlp
4

Similar Publications

A novel LC-MS/MS assay for low concentrations of creatinine in sweat and saliva to validate biosensors for continuous monitoring of renal function.

J Chromatogr B Analyt Technol Biomed Life Sci

December 2024

Clinical Laboratory, Catharina Hospital Eindhoven, Eindhoven 5623 EJ, The Netherlands; Department of Biomedical Engineering, Chemical Biology, Eindhoven University of Technology, Groene Loper 3, Eindhoven 5612 AE, The Netherlands.

Monitoring of kidney function traditionally relies on plasma creatinine concentrations, necessitating invasive blood draws. Non-invasively obtainable biofluids, such as sweat and saliva, present a patient-friendly alternative with potential for continuous monitoring. This study focusses on developing and validating a novel Liquid Chromatography- tandem Mass Spectrometry (LC-MS/MS) assay as a reference test for measuring low creatinine concentrations in sweat and saliva.

View Article and Find Full Text PDF

Background: Chronic obstructive pulmonary disease (COPD) primarily originates from exposure to tobacco smoke, although factors, such as air pollution and exposure to chemicals, also play a role. One of the primary treatments for COPD is oxygen therapy, which helps manage dyspnea and improve survival rates. Mobile health (mHealth) technologies have demonstrated significant potential in monitoring patients with chronic diseases, offering new avenues for enhancing patient care and disease management.

View Article and Find Full Text PDF

Trends in Aptasensing and the Enhancement of Diagnostic Efficiency and Accuracy.

ACS Synth Biol

January 2025

Biosensors and Nanobiotechnology Laboratory, Chemical Sciences, Faculty of Science, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong, BE 1410, Brunei Darussalam.

The field of healthcare diagnostics is navigating complex challenges driven by evolving patient demographics and the rapid advancement of new technologies worldwide. In response to these challenges, these biosensors offer distinctive advantages over traditional diagnostic methods, such as cost-effectiveness, enhanced specificity, and adaptability, making their integration with point-of-care (POC) platforms more feasible. In recent years, aptasensors have significantly evolved in diagnostic capabilities through the integration of emerging technologies such as microfluidics, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems, wearable devices, and machine learning (ML), driving progress in precision medicine and global healthcare solutions.

View Article and Find Full Text PDF

Group B (GBS) is a major cause of fetal and neonatal mortality worldwide. Many of the adverse effects of invasive GBS are associated with inflammation; therefore, understanding bacterial factors that promote inflammation is of critical importance. Membrane vesicles (MVs), which are produced by many bacteria, may modulate host inflammatory responses.

View Article and Find Full Text PDF

Synonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this study, we initially investigate the impact of sequence-based features, including DNA shape, physicochemical properties and one-hot encoding of nucleotides, and deep learning-derived features from pre-trained chemical molecule language models based on BERT.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!