Data integration of structured and unstructured sources for assigning clinical codes to patient stays.

J Am Med Inform Assoc

Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp - Antwerp University Hospital, Belgium; ADReM (Advanced Database Research and Modelling), University of Antwerp, Antwerp, Belgium.

Published: April 2016

Objective: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation.

Methods: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties.

Results: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes.

Discussion: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach.

Conclusions: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4954635PMC
http://dx.doi.org/10.1093/jamia/ocv115DOI Listing

Publication Analysis

Top Keywords

data integration
28
data
17
late data
12
data sources
12
structured unstructured
8
clinical codes
8
patient stays
8
electronic health
8
diagnostic procedural
8
data types
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!