Population physiology: leveraging electronic health record data to understand human endocrine dynamics.

D J Albers George Hripcsak Michael Schmidt

PLoS One

Department of Biomedical Informatics, Columbia University, New York, New York, United States of America.

Published: June 2013

Studying physiology and pathophysiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data can be intrusive, dangerous, and expensive. One solution is to use data that have been collected for a different purpose. Electronic health record (EHR) data promise to support the development and testing of mechanistic physiologic models on diverse populations and allow correlation with clinical outcomes, but limitations in the data have thus far thwarted such use. For example, using uncontrolled population-scale EHR data to verify the outcome of time dependent behavior of mechanistic, constructive models can be difficult because: (i) aggregation of the population can obscure or generate a signal, (ii) there is often no control population with a well understood health state, and (iii) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques. This paper shows that it is possible to use EHR data to test a physiological model for a population and over long time scales. Specifically, a methodology is developed and demonstrated for testing a mechanistic, time-dependent, physiological model of serum glucose dynamics with uncontrolled, population-scale, physiological patient data extracted from an EHR repository. It is shown that there is no observable daily variation the normalized mean glucose for any EHR subpopulations. In contrast, a derived value, daily variation in nonlinear correlation quantified by the time-delayed mutual information (TDMI), did reveal the intuitively expected diurnal variation in glucose levels amongst a random population of humans. Moreover, in a population of continuously (tube) fed patients, there was no observable TDMI-based diurnal signal. These TDMI-based signals, via a glucose insulin model, were then connected with human feeding patterns. In particular, a constructive physiological model was shown to correctly predict the difference between the general uncontrolled population and a subpopulation whose feeding was controlled.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3522687	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0048058	PLOS

Publication Analysis

Top Keywords

ehr data

physiological model

population

data

electronic health

health record

population long

testing mechanistic

uncontrolled population-scale

daily variation

Similar Publications

Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.

Eur Heart J Digit Health

January 2025

School of Life Course & Population Sciences, King's College London, SE1 1UL London, UK.

Tianyi Liu Andrew Krentz Lei Lu Vasa Curcin

Cardiovascular disease (CVD) remains a major cause of mortality in the UK, prompting the need for improved risk predictive models for primary prevention. Machine learning (ML) models utilizing electronic health records (EHRs) offer potential enhancements over traditional risk scores like QRISK3 and ASCVD. To systematically evaluate and compare the efficacy of ML models against conventional CVD risk prediction algorithms using EHR data for medium to long-term (5-10 years) CVD risk prediction.

View Article and Find Full Text PDF

Similar Publications

Evaluating dimensionality reduction of comorbidities for predictive modeling in individuals with neurofibromatosis type 1.

JAMIA Open

February 2025

Institute for Informatics, Data Science and Biostatistics, Washington University, Saint Louis, MO 63110, United States.

Aditi Gupta Ethan Hillis Inez Y Oh Stephanie M Morris Zach Abrams

Objective: Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1).

Materials And Methods: EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes.

View Article and Find Full Text PDF

Similar Publications

High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates.

Am J Epidemiol

January 2025

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Janick Weberpals Pamela A Shaw Kueiyu Joshua Lin Richard Wyss Joseph M Plasek

Multiple imputation (MI) models can be improved with auxiliary covariates (AC), but their performance in high-dimensional data remains unclear. We aimed to develop and compare high-dimensional MI (HDMI) methods using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation with acute kidney injury as outcome and simulated 100 cohorts with a null treatment effect, incorporating creatinine labs, atrial fibrillation (AFib), and other investigator-derived confounders in the outcome generation.

View Article and Find Full Text PDF

Similar Publications

External validation of a novel cancer-associated venous thromboembolism risk assessment score in a safety-net hospital.

Res Pract Thromb Haemost

January 2025

Section of Hematology & Medical Oncology, Boston University School of Medicine, Boston, Massachusetts, USA.

Karlynn N Dulberger Jennifer La Ang Li Saran Lotfollahzadeh Asha Jose

Background: Cancer-associated thrombosis (CAT) is a leading cause of death in patients diagnosed with cancer. However, pharmacologic thromboprophylaxis use in cancer patients must be carefully evaluated due to a 2-fold increased risk of experiencing a major bleeding event within this population. The electronic health record CAT (EHR-CAT) risk assessment model (RAM) was recently developed, and reports improved performance over the widely used Khorana score.

View Article and Find Full Text PDF

Similar Publications

A network-based systems genetics framework identifies pathobiology and drug repurposing in Parkinson's disease.

NPJ Parkinsons Dis

January 2025

Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.

Lijun Dou Zhenxing Xu Jielin Xu Chengxi Zang Chang Su

Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder. However, current treatments only manage symptoms and lack the ability to slow or prevent disease progression. We utilized a systems genetics approach to identify potential risk genes and repurposable drugs for PD.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!