In this paper an automatic classification system for pathological findings is presented. The starting point in our undertaking was a pathologic tissue collection with about 1.4 million tissue samples described by free text records over 23 years. Exploring knowledge out of this "big data" pool is a challenging task, especially when dealing with unstructured data spanning over many years. The classification is based on an ontology-based term extraction and decision tree build with a manually curated classification system. The information extracting system is based on regular expressions and a text substitution system. We describe the generation of the decision trees by medical experts using a visual editor. Also the evaluation of the classification process with a reference data set is described. We achieved an F-Score of 89,7% for ICD-10 and an F-Score of 94,7% for ICD-O classification. For the information extraction of the tumor staging and receptors we achieved am F-Score ranging from 81,8 to 96,8%.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346425PMC
http://dx.doi.org/10.1007/s12553-016-0169-8DOI Listing

Publication Analysis

Top Keywords

automatic classification
8
classification system
8
achieved f-score
8
classification
5
classification histopathological
4
histopathological diagnoses
4
diagnoses building
4
building large
4
large scale
4
scale tissue
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!