Purpose: Large-scale analysis of real-world evidence is often limited to structured data fields that do not contain reliable information on recurrence status and disease sites. In this report, we describe a natural language processing (NLP) framework that uses data from free-text, unstructured reports to classify recurrence status and sites of recurrence for patients with breast and hepatocellular carcinomas (HCC).

Methods: Using two cohorts of breast cancer and HCC cases, we validated the ability of a previously developed NLP model to distinguish between no recurrence, local recurrence, and distant recurrence, based on clinician notes, radiology reports, and pathology reports compared with manual curation. A second NLP model was trained and validated to identify sites of recurrence. We compared the ability of each NLP model to identify the presence, timing, and site of recurrence, when compared against manual chart review and International Classification of Diseases coding.

Results: A total of 1,273 patients were included in the development and validation of the two models. The NLP model for recurrence detects distant recurrence with an area under the curve of 0.98 (95% CI, 0.96 to 0.99) and 0.95 (95% CI, 0.88 to 0.98) in breast and HCC cohorts, respectively. The mean accuracy of the NLP model for detecting any site of distant recurrence was 0.9 for breast cancer and 0.83 for HCC. The NLP model for recurrence identified a larger proportion of patients with distant recurrence in a breast cancer database (11.1%) compared with International Classification of Diseases coding (2.31%).

Conclusion: We developed two NLP models to identify distant cancer recurrence, timing of recurrence, and sites of recurrence based on unstructured electronic health record data. These models can be used to perform large-scale retrospective studies in oncology.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8462655PMC
http://dx.doi.org/10.1200/CCI.20.00165DOI Listing

Publication Analysis

Top Keywords

nlp model
24
distant recurrence
20
recurrence
18
sites recurrence
12
breast cancer
12
natural language
8
language processing
8
distant cancer
8
cancer recurrence
8
recurrence sites
8

Similar Publications

Exam protocoling is a significant non-interpretive task burden for radiologists. The purpose of this work was to develop a natural language processing (NLP) artificial intelligence (AI) solution for automated protocoling of standard abdomen and pelvic magnetic resonance imaging (MRI) exams from basic associated order information and patient metadata. This Institutional Review Board exempt retrospective study used de-identified metadata from consecutive adult abdominal and pelvic MRI scans performed at our institution spanning 2.

View Article and Find Full Text PDF

Background And Aim: Prior investigations of the natural history of abdominal aortic aneurysms (AAAs) have been constrained by small sample sizes or uneven assessments of aggregated data. Natural language processing (NLP) can significantly enhance the investigation and treatment of patients with AAAs by swiftly and effectively collecting imaging data from health records. This meta-analysis aimed to evaluate the efficacy of NLP techniques in reliably identifying the existence or absence of AAAs and measuring the maximal abdominal aortic diameter in extensive datasets of radiology study reports.

View Article and Find Full Text PDF

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland.

View Article and Find Full Text PDF

Clinical entity-aware domain adaptation in low resource setting for inflammatory bowel disease.

Front Artif Intell

January 2025

Language Intelligence and Information Retrieval (LIIR) Lab, Department of Computer Science, KU Leuven, Leuven, Belgium.

The digitization of healthcare records has revolutionized medical research and patient care, with electronic health records (EHRs) containing a wealth of structured and unstructured data. Extracting valuable information from unstructured clinical text presents a significant challenge, necessitating automated tools for efficient data mining. Natural language processing (NLP) methods have been pivotal in this endeavor, aiming to extract crucial clinical concepts embedded within free-form text.

View Article and Find Full Text PDF

Introduction: Unsupervised feature learning methods inspired by natural language processing (NLP) models are capable of constructing patient-specific features from longitudinal Electronic Health Records (EHR).

Design: We applied document embedding algorithms to real-world paediatric intensive care (PICU) EHR data to extract patient-specific features from 1853 patients' PICU journeys using 647 unique lab tests and medication events. We evaluated the clinical utility of the patient features via a K-means clustering analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!