Studying physiology and pathophysiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data can be intrusive, dangerous, and expensive. One solution is to use data that have been collected for a different purpose. Electronic health record (EHR) data promise to support the development and testing of mechanistic physiologic models on diverse populations and allow correlation with clinical outcomes, but limitations in the data have thus far thwarted such use. For example, using uncontrolled population-scale EHR data to verify the outcome of time dependent behavior of mechanistic, constructive models can be difficult because: (i) aggregation of the population can obscure or generate a signal, (ii) there is often no control population with a well understood health state, and (iii) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques. This paper shows that it is possible to use EHR data to test a physiological model for a population and over long time scales. Specifically, a methodology is developed and demonstrated for testing a mechanistic, time-dependent, physiological model of serum glucose dynamics with uncontrolled, population-scale, physiological patient data extracted from an EHR repository. It is shown that there is no observable daily variation the normalized mean glucose for any EHR subpopulations. In contrast, a derived value, daily variation in nonlinear correlation quantified by the time-delayed mutual information (TDMI), did reveal the intuitively expected diurnal variation in glucose levels amongst a random population of humans. Moreover, in a population of continuously (tube) fed patients, there was no observable TDMI-based diurnal signal. These TDMI-based signals, via a glucose insulin model, were then connected with human feeding patterns. In particular, a constructive physiological model was shown to correctly predict the difference between the general uncontrolled population and a subpopulation whose feeding was controlled.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3522687 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0048058 | PLOS |
Eur Heart J Digit Health
January 2025
School of Life Course & Population Sciences, King's College London, SE1 1UL London, UK.
Cardiovascular disease (CVD) remains a major cause of mortality in the UK, prompting the need for improved risk predictive models for primary prevention. Machine learning (ML) models utilizing electronic health records (EHRs) offer potential enhancements over traditional risk scores like QRISK3 and ASCVD. To systematically evaluate and compare the efficacy of ML models against conventional CVD risk prediction algorithms using EHR data for medium to long-term (5-10 years) CVD risk prediction.
View Article and Find Full Text PDFJAMIA Open
February 2025
Institute for Informatics, Data Science and Biostatistics, Washington University, Saint Louis, MO 63110, United States.
Objective: Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1).
Materials And Methods: EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes.
Am J Epidemiol
January 2025
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Multiple imputation (MI) models can be improved with auxiliary covariates (AC), but their performance in high-dimensional data remains unclear. We aimed to develop and compare high-dimensional MI (HDMI) methods using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation with acute kidney injury as outcome and simulated 100 cohorts with a null treatment effect, incorporating creatinine labs, atrial fibrillation (AFib), and other investigator-derived confounders in the outcome generation.
View Article and Find Full Text PDFRes Pract Thromb Haemost
January 2025
Section of Hematology & Medical Oncology, Boston University School of Medicine, Boston, Massachusetts, USA.
Background: Cancer-associated thrombosis (CAT) is a leading cause of death in patients diagnosed with cancer. However, pharmacologic thromboprophylaxis use in cancer patients must be carefully evaluated due to a 2-fold increased risk of experiencing a major bleeding event within this population. The electronic health record CAT (EHR-CAT) risk assessment model (RAM) was recently developed, and reports improved performance over the widely used Khorana score.
View Article and Find Full Text PDFNPJ Parkinsons Dis
January 2025
Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.
Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder. However, current treatments only manage symptoms and lack the ability to slow or prevent disease progression. We utilized a systems genetics approach to identify potential risk genes and repurposable drugs for PD.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!