Background: Manual extraction of real-world clinical data for research can be time-consuming and prone to error. We assessed the feasibility of using natural language processing (NLP), an AI technique, to automate data extraction for patients with advanced lung cancer (aLC). We assessed the external validity of our NLP-extracted data by comparing our findings to those reported in the literature.

Methods: Patients diagnosed with stage IIIB or IV lung cancer between January 2015 to December 2017 at Princess Margaret Cancer Centre who received at least one dose of systemic therapy were included. Their electronic health records were provided to Pentavere's NLP platform, DARWEN, in March 2019. Descriptive statistics summarized baseline patient and cancer characteristics, molecular biomarkers, and first-line systemic therapies. Cox multivariate models were used to evaluate prognostic factors for advanced non-small cell lung cancer (NSCLC) and small-cell lung cancer (SCLC) cohort.

Result: NLP extracted clinical information (n = 333 patients) in a total of 8 hours, with only a few missing data for smoking status (n = 2), and Eastern Cooperative Oncology Group (ECOG) status (n = 5). Baseline patient and cancer characteristics summarized from NLP-extracted data were comparable to those in previous studies and population reports. For NSCLC patients, being male (HR 1.44, 95 % CI [1.04, 2.00]), having worse ECOG (1.48 [1.22, 1.81]), and having liver (2.24 [1.45, 3.46]), bone (2.09 [1.48, 2.96]), or lung metastases (2.54 [1.05, 2.26]) were associated with worse survival outcomes. For SCLC patients, having older age (HR 1.70 per 10 years, 95 % CI [1.10, 2.63]) and liver metastases (3.81 [1.61, 9.01]) were associated with worse survival outcomes.

Conclusion: Our study demonstrated that automated data extraction using NLP is feasible and time efficient. Additionally, the NLP-extracted data can be used to identify valid and useful clinical endpoints for research. NLP holds significant potential to accelerate the extraction of real-world data for future observational studies.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.lungcan.2025.108080DOI Listing

Publication Analysis

Top Keywords

lung cancer
20
nlp-extracted data
12
data
9
external validity
8
natural language
8
advanced lung
8
cancer
8
extraction real-world
8
data extraction
8
baseline patient
8

Similar Publications

Tumour-infiltrating Lymphocytes and Radiation Therapy in Rectal Cancer: Systematic Review and Meta-analysis.

Clin Oncol (R Coll Radiol)

December 2024

Faculty of Medicine and Health Sciences, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Belgium; Department of Radiation Oncology, Iridium Netwerk, Oosterveldlaan 22, 2610, Antwerp, Belgium. Electronic address:

Aim: Tumour-infiltrating lymphocytes (TILs) represent a promising cancer biomarker. Different TILs, including CD8+, CD4+, CD3+, and FOXP3+, have been associated with clinical outcomes. However, data are lacking regarding the value of TILs for patients receiving radiation therapy (RT).

View Article and Find Full Text PDF

Rare dual MYH9-ROS1 fusion variants in a patient with lung adenocarcinoma: A case report.

Medicine (Baltimore)

January 2025

Department of Respiratory and Critical Care Medicine, Zhongshan City People's Hospital, Zhongshan, Guangdong Province, China.

Rationale: ROS proto-oncogene 1 (ROS1) fusion is a rare but important driver mutation in non-small cell lung cancer, which usually shows significant sensitivity to small molecule tyrosine kinase inhibitors. With the widespread application of next-generation sequencing (NGS), more fusions and co-mutations of ROS1 have been discovered. Non-muscle myosin heavy chain 9 (MYH9) is a rare fusion partner of ROS1 gene as reported.

View Article and Find Full Text PDF

Purpose: Adaptive radiotherapy accounts for interfractional anatomic changes. We hypothesize that changes in the gross tumor volumes identified during daily scans could be analyzed using delta-radiomics to predict disease progression events. We evaluated whether an auxiliary data set could improve prediction performance.

View Article and Find Full Text PDF

Purpose: Although lung cancer is one of the most common malignancies, the underlying genetics regarding susceptibility remain poorly understood. We characterized the spectrum of pathogenic/likely pathogenic (P/LP) germline variants within DNA damage response (DDR) genes among lung cancer cases and controls in non-Hispanic Whites (NHWs) and African Americans (AAs).

Materials And Methods: Rare, germline variants in 67 DDR genes with evidence of pathogenicity were identified using the ClinVar database.

View Article and Find Full Text PDF

Purpose: Patients with chronic kidney disease (CKD) and end-stage renal disease (ESRD) have been noted to face increased cancer incidence. Yet, the impact of concomitant renal dysfunction on acute outcomes following elective surgery for cancer remains to be elucidated.

Methods: All adult hospitalizations entailing elective resection for lung, esophageal, gastric, pancreatic, hepatic, or colon cancer were identified in the 2016-2020 National Inpatient Sample.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!