Early Diagnosis of Pancreatic Cancer via Machine Learning Analysis of a National Electronic Medical Record Database.

JCO Clin Cancer Inform

Health Economics and Evidence Development, Novartis Oncology, East Hanover, NJ.

Published: September 2023

Purpose: Pancreatic cancer (PaC) is often diagnosed at advanced stages, resulting in one of the lowest survival rates among patients with cancer. The purpose of this study was to investigate whether machine learning (ML) models can predict with high sensitivity and specificity an increased risk for PaC ahead of clinical diagnosis.

Methods: Optum deidentified electronic health record (EHR) data set was used to extract 1-year data for each patient and to sample for PaC diagnosis, the number of interactions with the health care system, and unique demographic and clinical features. Data for patients with PaC diagnosis were collected between 1 and 2 years before the diagnosis. Standard binary classification ML models were used on training and testing data sets. Data analyses were performed using the scikit-learn package version 1.0.1.

Results: The data set consisted of 18,987 patient EHRs collected between December 31, 2007, and December 31, 2017. EHRs with 10 unique features and at least three health care interactions were used for model training (N = 15,189; n = 8,438 [56%] with PaC) and testing (N = 3,798; n = 2,127 [56%] with PaC). The ensemble model achieved an AUC of 0.89, a sensitivity of 85.61%, and a specificity of 76.18% on the testing data set and produced superior results compared with other binary classifiers. Increasing unique health care interactions to nine failed to improve the AUC score. When the testing data set was enlarged to 5,696 patients, the ensemble model achieved an AUC of 0.92 and a specificity of 93.21%, but the sensitivity was compromised.

Conclusion: The ensemble model exceeded the state-of-the-art level of performance for prediction of PaC ahead of clinical diagnosis with a minimal clinically guided input, providing a potential strategy for selection of high-risk patients for further screening.

Download full-text PDF

Source
http://dx.doi.org/10.1200/CCI.23.00076DOI Listing

Publication Analysis

Top Keywords

data set
16
health care
12
testing data
12
ensemble model
12
pancreatic cancer
8
machine learning
8
pac ahead
8
ahead clinical
8
data
8
pac diagnosis
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!