Background: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results.

Objective: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered.

Methods: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling).

Results: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation.

Conclusions: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5845101PMC
http://dx.doi.org/10.2196/medinform.8960DOI Listing

Publication Analysis

Top Keywords

missing data
12
data
11
missing
9
methods
9
structured data
8
electronic health
8
imputation methods
8
random missing
8
missing random
8
imputation
5

Similar Publications

Can ACR TI-RADS predict the malignant risk of medullary thyroid cancer?

J Clin Transl Endocrinol

March 2025

Department of Ultrasound, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, PR China.

Objectives: This study aimed to evaluate the diagnostic performance for medullary thyroid cancer (MTC) based on the 2017 Thyroid Imaging Reporting and Data System by the American College of Radiology (ACR TI-RADS) guideline, and the ability to recommend fine needle aspiration (FNA) for MTC.

Methods: Fifty-six MTCs were included, and 168 benign thyroid nodules (BTNs) and 168 papillary thyroid nodules (PTCs) were matched according to age. Ultrasound (US) features were reviewed according to ACR TI-RADS.

View Article and Find Full Text PDF

Introduction The application of natural language processing (NLP) for extracting data from biomedical research has gained momentum with the advent of large language models (LLMs). However, the effect of different LLM parameters, such as temperature settings, on biomedical text mining remains underexplored and a consensus on what settings can be considered "safe" is missing. This study evaluates the impact of temperature settings on LLM performance for a named entity recognition and a classification task in clinical trial publications.

View Article and Find Full Text PDF

A systematic review of reports on aquatic envenomation: are there global hot spots and vulnerable populations?

J Venom Anim Toxins Incl Trop Dis

December 2024

Department of Tropical Medicine, Medical Microbiology and Pharmacology, John A. Burns School of Medicine, University of Hawai'i at Mānoa, Honolulu, Hawaii, United States.

Envenomation by aquatic species is an under-investigated source of human morbidity and mortality. Increasing population density along marine and freshwater coastlines increases these incidents. Specific occupational groups - including commercial fishery workers, fisherfolk, marine tourism workers, and researchers - rely on aquatic resources for their livelihood.

View Article and Find Full Text PDF

Objectives: Vaccination is a critical public health intervention that significantly reduces morbidity and mortality from vaccine-preventable diseases. Despite the proven benefits of vaccines, missed opportunities for vaccination (MOVs) remain a significant challenge in many low-income countries, including Somalia. This study aimed to determine the prevalence and identify the factors contributing to MOVs in Mogadishu, Somalia.

View Article and Find Full Text PDF

Introduction: Mental health is crucial for overcoming obstacles, completing tasks, and contributing to society. Mental, social, and cognitive healths are included. In demanding fields like medicine, academic pressure can cause exhaustion, poor performance, and behavioral changes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!