Combining structured and unstructured data for predictive models: a deep learning approach.

BMC Med Inform Decis Mak

Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, OH, 43210, USA.

Published: October 2020

Background: The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models.

Methods: In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions.

Results: We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only.

Conclusions: The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7596962PMC
http://dx.doi.org/10.1186/s12911-020-01297-6DOI Listing

Publication Analysis

Top Keywords

structured data
16
deep learning
12
clinical notes
12
patient representation
12
notes structured
12
data
9
combining structured
8
structured unstructured
8
unstructured data
8
data predictive
8

Similar Publications

Objective: Epilepsy is a common neurological disease affecting nearly 1% of the global population, and temporal lobe epilepsy (TLE) is the most common type. Patients experience recurrent seizures and chronic cognitive deficits that can impact their quality of life, ability to work, and independence. These cognitive deficits often extend beyond the temporal lobe and are not well understood.

View Article and Find Full Text PDF

Objective: This study aimed to determine the prevalence and predictors of genitourinary syndrome of menopause (GSM) in Brazilian women.

Methods: A cross-sectional population-based household survey was conducted among 749 women aged 45 to 60 years. The dependent variable was the presence of GSM, which was assessed using a pretested structured questionnaire.

View Article and Find Full Text PDF

Background: Most cancer survivors have multiple cardiovascular risk factors, increasing their risk of poor cardiovascular and cancer outcomes. The Automated Heart-Health Assessment (AH-HA) tool is a novel electronic health record clinical decision support tool based on the American Heart Association's Life's Simple 7 cardiovascular health (CVH) metrics to promote CVH assessment and discussion in outpatient oncology. Before proceeding to future implementation trials, it is critical to establish the acceptability of the tool among providers and survivors.

View Article and Find Full Text PDF

In light of the increasing importance for measuring myelin ratios - the ratio of axon-to-fiber (axon + myelin) diameters in myelin internodes - to understand normal physiology, disease states, repair mechanisms and myelin plasticity, there is urgent need to minimize processing and statistical artifacts in current methodologies. Many contemporary studies fall prey to a variety of artifacts, reducing study outcome robustness and slowing development of novel therapeutics. Underlying causes stem from a lack of understanding of the myelin ratio, which has persisted more than a century.

View Article and Find Full Text PDF

Background: Cervical cancer screening program in Uganda is opportunistic and focuses mainly on women aged 25-49 years. Female sex workers (FSWs) are at increased risk of developing invasive cervical cancer. There is limited data regarding the uptake and acceptability of cervical cancer screening among FSWs in Uganda.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!