Objectives: In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports.

Methods: The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used.

Results: The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly.

Conclusions: The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.

Download full-text PDF

Source
http://dx.doi.org/10.3414/ME15-02-0019DOI Listing

Publication Analysis

Top Keywords

data elements
20
free-text diagnostic
16
diagnostic reports
16
multiple myeloma
16
average error
16
elements free-text
12
data
10
heidelberg university
8
university hospital
8
manual documentation
8

Similar Publications

Modeling the effects of thin filament near-neighbor cooperative interactions in mammalian myocardium.

J Gen Physiol

March 2025

Department of Animal, Veterinary, and Food Sciences, College of Agricultural and Life Sciences, University of Idaho, Moscow, ID, USA.

The mechanisms underlying cooperative activation and inactivation of myocardial force extend from local, near-neighbor interactions involving troponin-tropomyosin regulatory units (RU) and crossbridges (XB) to more global interactions across the sarcomere. To better understand these mechanisms in the hearts of small and large mammals, we undertook a simplified mathematical approach to assess the contribution of three types of near-neighbor cooperative interactions, i.e.

View Article and Find Full Text PDF

Background And Purpose: Approximately, 30% to 60% of older adults experience functional decline following hospitalization, which has implications for their ability to meet social needs after discharge. Exploring the unmet social needs of older adults following discharge is warranted to rethink the elements of hospital discharge in low-resource countries. This study explored the unmet social needs of older adults with mobility limitations following discharge from an inpatient rehabilitation unit in a state hospital in Northern Nigeria.

View Article and Find Full Text PDF

It is critical to appreciate the role of the tumour-associated microenvironment (TME) in developing strategies for the effective therapy of cancer, as it is an important factor that determines the evolution and treatment response of tumours. This work combines machine learning and single-cell RNA sequencing (scRNA-seq) to explore the glioma tumour microenvironment's TME. With the help of genome-wide association studies (GWAS) and Mendelian randomization (MR), we found genetic variants associated with TME elements that affect cancer and cardiovascular disease outcomes.

View Article and Find Full Text PDF

In the context of climate changing environments, microalgae can be excellent organisms to understand molecular mechanisms that activate survival strategies under stress. Chlamydomonas reinhardtii signalling mutants are extremely useful to decipher which strategies photosynthetic organisms use to cope with changeable environments. The mutant vip1-1 has an altered profile of pyroinositol polyphosphates (PP-InsPs), which are signalling molecules present in all eukaryotes and have been connected to P signalling in other organisms including plants, but their implications in other nutrient signalling are still under evaluation.

View Article and Find Full Text PDF

Background: Childhood obesity and the rate of its spread is a serious threat to the reproductive health of the nation, especially among boys, being a background for delaying sexual development and further disrupting fertility.

Aim: To study the peculiarities of the ratio of the level of leptin and a number of toxic and essential chemical trace elements in biological environments in adolescent boys aged 13-14 years with obesity and delayed sexual development.

Materials And Methods: Three groups of adolescents aged 13-14 years were studied and formed: the main ones - with constitutional exogenous obesity of 1-2 degrees (1-20 boys without secondary signs of puberty; 2 - 24 boys with 2-4 stages of puberty according to Tanner) and comparisons (3 - 15 boys with normal body weight and without deviations in puberty).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!