In this paper an automatic classification system for pathological findings is presented. The starting point in our undertaking was a pathologic tissue collection with about 1.4 million tissue samples described by free text records over 23 years. Exploring knowledge out of this "big data" pool is a challenging task, especially when dealing with unstructured data spanning over many years. The classification is based on an ontology-based term extraction and decision tree build with a manually curated classification system. The information extracting system is based on regular expressions and a text substitution system. We describe the generation of the decision trees by medical experts using a visual editor. Also the evaluation of the classification process with a reference data set is described. We achieved an F-Score of 89,7% for ICD-10 and an F-Score of 94,7% for ICD-O classification. For the information extraction of the tumor staging and receptors we achieved am F-Score ranging from 81,8 to 96,8%.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346425 | PMC |
http://dx.doi.org/10.1007/s12553-016-0169-8 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!