Background: Manual extraction of real-world clinical data for research can be time-consuming and prone to error. We assessed the feasibility of using natural language processing (NLP), an AI technique, to automate data extraction for patients with advanced lung cancer (aLC). We assessed the external validity of our NLP-extracted data by comparing our findings to those reported in the literature.
Methods: Patients diagnosed with stage IIIB or IV lung cancer between January 2015 to December 2017 at Princess Margaret Cancer Centre who received at least one dose of systemic therapy were included. Their electronic health records were provided to Pentavere's NLP platform, DARWEN, in March 2019. Descriptive statistics summarized baseline patient and cancer characteristics, molecular biomarkers, and first-line systemic therapies. Cox multivariate models were used to evaluate prognostic factors for advanced non-small cell lung cancer (NSCLC) and small-cell lung cancer (SCLC) cohort.
Result: NLP extracted clinical information (n = 333 patients) in a total of 8 hours, with only a few missing data for smoking status (n = 2), and Eastern Cooperative Oncology Group (ECOG) status (n = 5). Baseline patient and cancer characteristics summarized from NLP-extracted data were comparable to those in previous studies and population reports. For NSCLC patients, being male (HR 1.44, 95 % CI [1.04, 2.00]), having worse ECOG (1.48 [1.22, 1.81]), and having liver (2.24 [1.45, 3.46]), bone (2.09 [1.48, 2.96]), or lung metastases (2.54 [1.05, 2.26]) were associated with worse survival outcomes. For SCLC patients, having older age (HR 1.70 per 10 years, 95 % CI [1.10, 2.63]) and liver metastases (3.81 [1.61, 9.01]) were associated with worse survival outcomes.
Conclusion: Our study demonstrated that automated data extraction using NLP is feasible and time efficient. Additionally, the NLP-extracted data can be used to identify valid and useful clinical endpoints for research. NLP holds significant potential to accelerate the extraction of real-world data for future observational studies.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.lungcan.2025.108080 | DOI Listing |
Clin Oncol (R Coll Radiol)
December 2024
Faculty of Medicine and Health Sciences, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Belgium; Department of Radiation Oncology, Iridium Netwerk, Oosterveldlaan 22, 2610, Antwerp, Belgium. Electronic address:
Aim: Tumour-infiltrating lymphocytes (TILs) represent a promising cancer biomarker. Different TILs, including CD8+, CD4+, CD3+, and FOXP3+, have been associated with clinical outcomes. However, data are lacking regarding the value of TILs for patients receiving radiation therapy (RT).
View Article and Find Full Text PDFMedicine (Baltimore)
January 2025
Department of Respiratory and Critical Care Medicine, Zhongshan City People's Hospital, Zhongshan, Guangdong Province, China.
Rationale: ROS proto-oncogene 1 (ROS1) fusion is a rare but important driver mutation in non-small cell lung cancer, which usually shows significant sensitivity to small molecule tyrosine kinase inhibitors. With the widespread application of next-generation sequencing (NGS), more fusions and co-mutations of ROS1 have been discovered. Non-muscle myosin heavy chain 9 (MYH9) is a rare fusion partner of ROS1 gene as reported.
View Article and Find Full Text PDFJCO Clin Cancer Inform
January 2025
Machine Learning Department, H. Lee Moffit Cancer Center and Research Institute, Tampa, FL.
Purpose: Adaptive radiotherapy accounts for interfractional anatomic changes. We hypothesize that changes in the gross tumor volumes identified during daily scans could be analyzed using delta-radiomics to predict disease progression events. We evaluated whether an auxiliary data set could improve prediction performance.
View Article and Find Full Text PDFJCO Precis Oncol
January 2025
Karmanos Cancer Institute and Department of Oncology, Wayne State University School of Medicine, Detroit, MI.
Purpose: Although lung cancer is one of the most common malignancies, the underlying genetics regarding susceptibility remain poorly understood. We characterized the spectrum of pathogenic/likely pathogenic (P/LP) germline variants within DNA damage response (DDR) genes among lung cancer cases and controls in non-Hispanic Whites (NHWs) and African Americans (AAs).
Materials And Methods: Rare, germline variants in 67 DDR genes with evidence of pathogenicity were identified using the ClinVar database.
PLoS One
January 2025
Cardiovascular Outcomes Research Laboratories (CORELAB), University of California, Los Angeles, Los Angeles, CA, United States of America.
Purpose: Patients with chronic kidney disease (CKD) and end-stage renal disease (ESRD) have been noted to face increased cancer incidence. Yet, the impact of concomitant renal dysfunction on acute outcomes following elective surgery for cancer remains to be elucidated.
Methods: All adult hospitalizations entailing elective resection for lung, esophageal, gastric, pancreatic, hepatic, or colon cancer were identified in the 2016-2020 National Inpatient Sample.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!