Purpose: Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs). Such documentation results in tedious and time-consuming abstraction efforts by tumor registrars and other secondary users. This information may be amenable to extraction by automated methods.

Methods: We developed a natural language processing algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR. These methods were developed in a training set of patients with lung cancer, independently validated in a test set of patients with lung cancer, and compared with the gold standard of Vanderbilt Cancer Registry–determined stage (when available).

Results: In the combined data set of 2,323 patients (training set, n = 1,103; validation set, n = 1,220), 751,880 documents were analyzed. A stage statement was extracted from 2,239 (98.6%) patient EHRs (median, 24 documents per patient). Stage discordance was common, affecting 83.6% of these EHRs. Nevertheless, algorithmically derived stage accuracy was high in the validation set (κ = 0.906; 95% CI, 0.873 to 0.939), when including notes generated within 14 weeks from diagnosis.

Conclusion: Accurate stage determination can be achieved through automated methods applied to narrative text, despite the frequent presence of discordance in such data. Our results also indicate that stage can be automatically captured in a shorter timeframe than the 6-month window used by cancer registries, as early as 5 weeks from diagnosis. These methods may be generalizable to large narrative cancer data sets.

Download full-text PDF

Source
http://dx.doi.org/10.1200/JOP.2015.004622DOI Listing

Publication Analysis

Top Keywords

stage
10
cancer stage
8
electronic health
8
stage discordance
8
training set
8
set patients
8
patients lung
8
lung cancer
8
validation set
8
cancer
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!