Supervised line attention for tumor attribute classification from pathology reports: Higher performance with less data.

J Biomed Inform

Department of Statistics, University of California, Berkeley, United States; Department of Electrical Engineering and Computer Science, University of California, Berkeley, United States; Chan-Zuckerberg Biohub, San Francisco, CA, United States. Electronic address:

Published: October 2021

Objective: We aim to build an accurate machine learning-based system for classifying tumor attributes from cancer pathology reports in the presence of a small amount of annotated data, motivated by the expensive and time-consuming nature of pathology report annotation. An enriched labeling scheme that includes the location of relevant information along with the final label is used along with a corresponding hierarchical method for classifying reports that leverages these enriched annotations.

Materials And Methods: Our data consists of 250 colon cancer and 250 kidney cancer pathology reports from 2002 to 2019 at the University of California, San Francisco. For each report, we classify attributes such as procedure performed, tumor grade, and tumor site. For each attribute and document, an annotator trained by an oncologist labeled both the value of that attribute as well as the specific lines in the document that indicated the value. We develop a model that uses these enriched annotations that first predicts the relevant lines of the document, then predicts the final value given the predicted lines. We compare our model to multiple state-of-the-art methods for classifying tumor attributes from pathology reports.

Results: Our results show that across colon and kidney cancers and varying training set sizes, our hierarchical method consistently outperforms state-of-the-art methods. Furthermore, performance comparable to these methods can be achieved with approximately half the amount of labeled data.

Conclusion: Document annotations that are enriched with location information are shown to greatly increase the sample efficiency of machine learning methods for classifying attributes of pathology reports.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2021.103872DOI Listing

Publication Analysis

Top Keywords

pathology reports
16
classifying tumor
8
tumor attributes
8
cancer pathology
8
hierarchical method
8
lines document
8
state-of-the-art methods
8
methods classifying
8
attributes pathology
8
pathology
6

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!