Using machine-learning algorithms to improve imputation in the medical expenditure panel survey.

Health Serv Res

Agency for Healthcare Research and Quality, Department of Health and Human Services, Rockville, Maryland, USA.

Published: April 2023

Objective: To assess the feasibility of applying machine learning (ML) methods to imputation in the Medical Expenditure Panel Survey (MEPS).

Data Sources: All data come from the 2016-2017 MEPS.

Study Design: Currently, expenditures for medical encounters in the MEPS are imputed with a predictive mean matching (PMM) algorithm in which a linear regression model is used to predict expenditures for events with (donors) and without (recipients) data. Recipient events and donor events are then matched based on the smallest distance between predicted expenditures, and the donor event's expenditures are used as the recipient event's imputation. We replace linear regression algorithm in the PMM framework with ML methods to predict expenditures. We examine five alternatives to linear regression: Gradient Boosting, Random Forests, Extreme Random Forests, Deep Neural Networks, and a Stacked Ensemble approach. Additionally, we introduce an alternative matching scheme, which matches on a vector of predicted expenditures by sources of payment instead of a single total expenditure prediction to generate potentially superior matches.

Data Collection: Study data is derived from a large federal survey.

Principal Findings: ML algorithms perform better at both prediction and matching imputation than Ordinary Least Squares (OLS), the most common prediction algorithm used in PMM. On average, the Stacked Ensemble approach that combines all the ML algorithms performs best, improving expenditure prediction R by 108% (0.156 points) and final imputation R by 227% (0.397 points). Matching on a prediction vector also improves alignment of sources of payments between donor and recipient events.

Conclusions: ML algorithms and an alternative matching scheme improve the overall quality of expenditure PMM imputation in the MEPS. These methods may have additional value in other national surveys that currently rely on PMM or similar methods for imputation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10012220PMC
http://dx.doi.org/10.1111/1475-6773.14115DOI Listing

Publication Analysis

Top Keywords

linear regression
12
imputation medical
8
medical expenditure
8
expenditure panel
8
panel survey
8
methods imputation
8
predict expenditures
8
predicted expenditures
8
algorithm pmm
8
random forests
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!