Incorporating informatively collected laboratory data from EHR in clinical prediction models.

BMC Med Inform Decis Mak

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.

Published: July 2024

Background: Electronic Health Records (EHR) are widely used to develop clinical prediction models (CPMs). However, one of the challenges is that there is often a degree of informative missing data. For example, laboratory measures are typically taken when a clinician is concerned that there is a need. When data are the so-called Not Missing at Random (NMAR), analytic strategies based on other missingness mechanisms are inappropriate. In this work, we seek to compare the impact of different strategies for handling missing data on CPMs performance.

Methods: We considered a predictive model for rapid inpatient deterioration as an exemplar implementation. This model incorporated twelve laboratory measures with varying levels of missingness. Five labs had missingness rate levels around 50%, and the other seven had missingness levels around 90%. We included them based on the belief that their missingness status can be highly informational for the prediction. In our study, we explicitly compared the various missing data strategies: mean imputation, normal-value imputation, conditional imputation, categorical encoding, and missingness embeddings. Some of these were also combined with the last observation carried forward (LOCF). We implemented logistic LASSO regression, multilayer perceptron (MLP), and long short-term memory (LSTM) models as the downstream classifiers. We compared the AUROC of testing data and used bootstrapping to construct 95% confidence intervals.

Results: We had 105,198 inpatient encounters, with 4.7% having experienced the deterioration outcome of interest. LSTM models generally outperformed other cross-sectional models, where embedding approaches and categorical encoding yielded the best results. For the cross-sectional models, normal-value imputation with LOCF generated the best results.

Conclusion: Strategies that accounted for the possibility of NMAR missing data yielded better model performance than those did not. The embedding method had an advantage as it did not require prior clinical knowledge. Using LOCF could enhance the performance of cross-sectional models but have countereffects in LSTM models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11270887PMC
http://dx.doi.org/10.1186/s12911-024-02612-1DOI Listing

Publication Analysis

Top Keywords

missing data
16
lstm models
12
cross-sectional models
12
clinical prediction
8
models
8
prediction models
8
laboratory measures
8
normal-value imputation
8
categorical encoding
8
data
7

Similar Publications

Lobar pneumonia is an acute inflammation with increasing incidence globally. Delayed treatment can lead to severe complications, posing life-threatening risks. Thus, it is crucial to determine effective treatment methods to improve the prognosis of children with lobar pneumonia.

View Article and Find Full Text PDF

Socio-economic inequalities in second primary cancer incidence: A competing risks analysis of women with breast cancer in England between 2000 and 2018.

Int J Cancer

January 2025

Inequalities in Cancer Outcomes Network (ICON) group, Department of Health Services Research and Policy, Faculty of Public Health and Policy, London School of Hygiene & Tropical Medicine, London, UK.

We aimed to investigate socio-economic inequalities in second primary cancer (SPC) incidence among breast cancer survivors. Using Data from cancer registries in England, we included all women diagnosed with a first primary breast cancer (PBC) between 2000 and 2018 and aged between 18 and 99 years and followed them up from 6 months after the PBC diagnosis until a SPC event, death, or right censoring, whichever came first. We used flexible parametric survival models adjusting for age and year of PBC diagnosis, ethnicity, PBC tumour stage, comorbidity, and PBC treatments to model the cause-specific hazards of SPC incidence and death according to income deprivation, and then estimated standardised cumulative incidences of SPC by deprivation, taking death as the competing event.

View Article and Find Full Text PDF

Equine temporomandibular joint diseases: A systematic review.

Equine Vet J

January 2025

Department of Large Animal Diseases and Clinic, Institute of Veterinary Medicine, Warsaw University of Life Sciences (WULS - SGGW), Warsaw, Poland.

Background: The temporomandibular joint (TMJ) is a unique joint that enables mandibular movement. Temporomandibular diseases (TMDs) impair joint function, leading to more or less specific clinical signs.

Objectives: To compile and disseminate clinical data and research findings from existing publications on equine TMD.

View Article and Find Full Text PDF

JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations.

View Article and Find Full Text PDF

Low-Complexity Timing Correction Methods for Heart Rate Estimation Using Remote Photoplethysmography.

Sensors (Basel)

January 2025

Department of Biomedical and Robotics Engineering, Incheon National University, Incheon 22012, Republic of Korea.

With the rise of modern healthcare monitoring, heart rate (HR) estimation using remote photoplethysmography (rPPG) has gained attention for its non-contact, continuous tracking capabilities. However, most HR estimation methods rely on stable, fixed sampling intervals, while practical image capture often involves irregular frame rates and missing data, leading to inaccuracies in HR measurements. This study addresses these issues by introducing low-complexity timing correction methods, including linear, cubic, and filter interpolation, to improve HR estimation from rPPG signals under conditions of irregular sampling and data loss.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!