MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.

Yuri Ahuja Yuesong Zou Aman Verma David Buckeridge Yue Li

J Biomed Inform

School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada. Electronic address:

Published: October 2022

Electronic Health Records (EHRs) contain rich clinical data collected at the point of the care, and their increasing adoption offers exciting opportunities for clinical informatics, disease risk prediction, and personalized treatment recommendation. However, effective use of EHR data for research and clinical decision support is often hampered by a lack of reliable disease labels. To compile gold-standard labels, researchers often rely on clinical experts to develop rule-based phenotyping algorithms from billing codes and other surrogate features. This process is tedious and error-prone due to recall and observer biases in how codes and measures are selected, and some phenotypes are incompletely captured by a handful of surrogate features. To address this challenge, we present a novel automatic phenotyping model called MixEHR-Guided (MixEHR-G), a multimodal hierarchical Bayesian topic model that efficiently models the EHR generative process by identifying latent phenotype structure in the data. Unlike existing topic modeling algorithms wherein the inferred topics are not identifiable, MixEHR-G uses prior information from informative surrogate features to align topics with known phenotypes. We applied MixEHR-G to an openly-available EHR dataset of 38,597 intensive care patients (MIMIC-III) in Boston, USA and to administrative claims data for a population-based cohort (PopHR) of 1.3 million people in Quebec, Canada. Qualitatively, we demonstrate that MixEHR-G learns interpretable phenotypes and yields meaningful insights about phenotype similarities, comorbidities, and epidemiological associations. Quantitatively, MixEHR-G outperforms existing unsupervised phenotyping methods on a phenotype label annotation task, and it can accurately estimate relative phenotype prevalence functions without gold-standard phenotype information. Altogether, MixEHR-G is an important step towards building an interpretable and automated phenotyping system using EHR data.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jbi.2022.104190	DOI Listing

Publication Analysis

Top Keywords

surrogate features

topic modeling

automatic phenotyping

electronic health

ehr data

phenotyping

data

phenotype

mixehr-guided guided

guided multi-modal

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!