Validation of a natural language processing algorithm to identify adenomas and measure adenoma detection rates across a health system: a population-level study.

Gastrointest Endosc

Service of Gastroenterology, St Joseph's Hospital, Hamilton, Ontario, Canada; Division of Gastroenterology, Department of Medicine, McMaster University, Hamilton, Ontario, Canada.

Published: January 2023

Background And Aims: Measuring adenoma detection rates (ADRs) at the population level is challenging because pathology reports are often reported in an unstructured format; further, there is significant variation in reporting methods across institutions. Natural language processing (NLP) can be used to extract relevant information from text-based records. We aimed to develop and validate an NLP algorithm to identify colorectal adenomas that could be used to report ADR at the population level in Ontario, Canada.

Methods: The sampling frame included pathology reports from all colonoscopies performed in Ontario in 2015 and 2016. Two random samples of 450 and 1000 reports were selected as the training and validation sets, respectively. Expert clinicians reviewed and classified reports as adenoma or other. The training set was used to develop an NLP algorithm (to identify adenomas) that was evaluated using the validation set. The NLP algorithm test characteristics were calculated using expert review as the reference. We used the algorithm to measure ADR for all endoscopists in Ontario in 2019.

Results: The 1450 pathology reports were derived from 62 laboratories, 266 pathologists, and 532 endoscopists. In the training set, the NLP algorithm for any adenoma had a sensitivity of 99.60% (95% confidence interval (CI), 97.77-99.99), specificity of 99.01% (95% CI, 96.49-99.88), positive predictive value of 99.19% (95% CI, 97.12-99.90), and F1 score of .99. Similar results were obtained for the validation set. The median ADR was 33% (interquartile range, 26%-40%).

Conclusions: When we used a population-based sample from Ontario, our NLP algorithm was highly accurate and was used at the system level to measure ADR.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.gie.2022.07.009DOI Listing

Publication Analysis

Top Keywords

nlp algorithm
20
algorithm identify
12
pathology reports
12
natural language
8
language processing
8
identify adenomas
8
adenoma detection
8
detection rates
8
population level
8
training set
8

Similar Publications

Background: In online mental health communities, the interactions among members can significantly reduce their psychological distress and enhance their mental well-being. The overall quality of support from others varies due to differences in people's capacities to help others. This results in some support seekers' needs being met, while others remain unresolved.

View Article and Find Full Text PDF

During the Covid-19 pandemic, the widespread use of social media platforms has facilitated the dissemination of information, fake news, and propaganda, serving as a vital source of self-reported symptoms related to Covid-19. Existing graph-based models, such as Graph Neural Networks (GNNs), have achieved notable success in Natural Language Processing (NLP). However, utilizing GNN-based models for propaganda detection remains challenging because of the challenges related to mining distinct word interactions and storing nonconsecutive and broad contextual data.

View Article and Find Full Text PDF

Background: Investigators conducting clinical trials have an ethical, scientific, and regulatory obligation to protect the safety of trial participants. Traditionally, safety monitoring includes manual review and coding of adverse event data by expert clinicians.

Objectives: Our study explores the use of natural language processing (NLP) and artificial intelligence (AI) methods to streamline and standardize clinician coding of adverse event data in Alzheimer's disease (AD) clinical trials.

View Article and Find Full Text PDF

Purpose: Compare the identification of patients with established status epilepticus (ESE) and refractory status epilepticus (RSE) in electronic health records (EHR) using human review versus natural language processing (NLP) assisted review.

Methods: We reviewed EHRs of patients aged 1 month to 21 years from Boston Children's Hospital (BCH). We included all patients with convulsive ESE or RSE during admission.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!