Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach.

JMIR AI

Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.

Published: January 2024

AI Article Synopsis

  • Pancreatic cancer, particularly pancreatic ductal adenocarcinoma (PDAC), is a leading cause of cancer death in the U.S., highlighting the need for early detection based on symptom recognition.
  • The study aimed to create a natural language processing (NLP) algorithm to extract PDAC-related symptoms from clinical notes, utilizing data from previous years among affected and unaffected patients.
  • The developed algorithm showed high precision and recall in identifying symptoms related to PDAC, validating its effectiveness in clinical settings and potentially aiding in early diagnosis.

Article Abstract

Background: Pancreatic cancer is the third leading cause of cancer deaths in the United States. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer, accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC.

Objective: This paper aims to develop a natural language processing (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated health care system.

Methods: We used unstructured data within 2 years prior to PDAC diagnosis between 2010 and 2019 and among matched patients without PDAC to identify 17 PDAC-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with input from clinicians and chart review. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation data set to assess performance and to the study implementation notes.

Results: A total of 408,147 and 709,789 notes were retrieved from 2611 patients with PDAC and 10,085 matched patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% for abdominal or epigastric pain to 0.05% for upper extremity deep vein thrombosis in the PDAC group, and from 1.75% for back pain to 0.01% for pale stool in the non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1000 notes showed that precision ranged from 98.9% (jaundice) to 84% (upper extremity deep vein thrombosis), recall ranged from 98.1% (weight loss) to 82.8% (epigastric bloating), and F-scores ranged from 0.97 (jaundice) to 0.86 (depression).

Conclusions: The developed and validated NLP algorithm could be used for the early detection of PDAC.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11041417PMC
http://dx.doi.org/10.2196/51240DOI Listing

Publication Analysis

Top Keywords

nlp algorithm
16
patients pdac
12
chart review
12
pancreatic ductal
8
ductal adenocarcinoma
8
natural language
8
language processing
8
pancreatic cancer
8
pdac
8
early detection
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!