Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.

JMIR Med Inform

Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France.

Published: December 2023

Background: In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible.

Objective: The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks.

Methods: This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English.

Results: We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%).

Conclusions: CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757232PMC
http://dx.doi.org/10.2196/42477DOI Listing

Publication Analysis

Top Keywords

data cdws
20
textual data
16
data
13
nlp
12
nlp methods
12
clinical
10
natural language
8
language processing
8
data clinical
8
clinical data
8

Similar Publications

Construction and demolition waste as a low-cost adsorbent for water treatment: kinetics, isotherm, thermodynamics, and Fenton regeneration.

Environ Sci Pollut Res Int

November 2024

Technology Center of Federal, University of Alagoas, Av. Lourival Melo Mota, S/N, Campus A.C. Simões, Tabuleiro Do Martins, Maceió, AL, 57072-970, Brazil.

The present study proposes to investigate the feasibility of using construction and demolition waste (CDW) as an aqueous remediation agent through adsorption. The CDW, with and without chemical and thermal pre-activation, was evaluated to remove the methylene blue (MB) dye from the water solution. Variables interfering with adsorption processes, such as adsorbent dosage, solution pH, and particle size, were evaluated.

View Article and Find Full Text PDF

Aims: We have characterized the microbiome of infected chronic diabetic wounds (CDWs), exploring associations with antibiotic use and wound severity in a Sri Lankan cohort.

Methods And Results: Fifty CDW patients were enrolled, 38 of whom received antibiotics. Tissue biopsies were analysed by microbiome profiling, and wounds were graded using the University of Texas Wound Grading System.

View Article and Find Full Text PDF

Objectives: Clinical Data Warehouses (CDW) are the designated infrastructures to enable access and analysis of large quantities of electronic health record data. Building and managing such systems implies extensive "data work" and coordination between multiple stakeholders. Our study focuses on the challenges these stakeholders face when designing, operating, and ensuring the durability of CDWs for research.

View Article and Find Full Text PDF

Barriers encountered with clinical data warehouses: Recommendations from a focus group.

Comput Methods Programs Biomed

November 2024

Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des Technologies de santé et des Pratiques médicales, F-59000, Lille, France.

Background And Objective: The increasing implementation and use of electronic health records over the last few decades has made a significant volume of clinical data being available. Over the past 20 years, hospitals have also adopted and implemented data warehouse technology to facilitate the reuse of administrative and clinical data for research. However, the implementation of clinical data warehouses encounters a set of barriers: ethical, legislative, technical, human and organizational.

View Article and Find Full Text PDF
Article Synopsis
  • The paper presents a new method to improve access to clinical data warehouses (CDWs) for researchers and biomedical companies.
  • It introduces a clinical data catalogue that answers key questions about data availability, quantity, and generation to aid project development.
  • A prototype of the catalogue is demonstrated using visualization from the CDW of Rennes University Hospital.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!