Automatic information extraction from childhood cancer pathology reports.

Hong-Jun Yoon Alina Peluso Eric B Durbin Xiao-Cheng Wu Antoinette Stroup Jennifer Doherty Stephen Schwartz Charles Wiggins Linda Coyle Lynne Penberthy

JAMIA Open

National Cancer Institute, Bethesda, Maryland, USA.

Published: July 2022

Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification.

Materials And Methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries.

Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports.

Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202570	PMC
http://dx.doi.org/10.1093/jamiaopen/ooac049	DOI Listing

Publication Analysis

Top Keywords

childhood cancer

cancer pathology

pathology reports

iccc classification

classification model

automatic extraction

extraction childhood

icd-o-3 classification

direct iccc

state cancer

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!