Flexible propensity score estimation strategies for clustered data in observational studies.

Ting-Hsuan Chang Trang Quynh Nguyen Youjin Lee John W Jackson Elizabeth A Stuart

Stat Med

Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.

Published: November 2022

Nonparametric machine learning methods for propensity score estimation may outperform logistic regression, but their effectiveness in clustered settings with unmeasured confounding is uncertain.
This study compares logistic regression, Bayesian additive regression trees, and generalized boosted modeling for propensity score weighting in clustered data.
Results indicate that in large samples, nonparametric methods can improve balance and reduce bias, but in small samples, they may be less reliable than logistic regression due to unmeasured confounding issues.

Existing studies have suggested superior performance of nonparametric machine learning over logistic regression for propensity score estimation. However, it is unclear whether the advantages of nonparametric propensity score modeling are carried to settings where there is clustering of individuals, especially when there is unmeasured cluster-level confounding. In this work we examined the performance of logistic regression (all main effects), Bayesian additive regression trees and generalized boosted modeling for propensity score weighting in clustered settings, with the clustering being accounted for by including either cluster indicators or random intercepts. We simulated data for three hypothetical observational studies of varying sample and cluster sizes. Confounders were generated at both levels, including a cluster-level confounder that is unobserved in the analyses. A binary treatment and a continuous outcome were generated based on seven scenarios with varying relationships between the treatment and confounders (linear and additive, nonlinear/nonadditive, nonadditive with the unobserved cluster-level confounder). Results suggest that when the sample and cluster sizes are large, nonparametric propensity score estimation may provide better covariate balance, bias reduction, and 95% confidence interval coverage, regardless of the degree of nonlinearity or nonadditivity in the true propensity score model. When the sample or cluster sizes are small, however, nonparametric approaches may become more vulnerable to unmeasured cluster-level confounding and thus may not be a better alternative to multilevel logistic regression. We applied the methods to the National Longitudinal Study of Adolescent to Adult Health data, estimating the effect of team sports participation during adolescence on adulthood depressive symptoms.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9996644	PMC
http://dx.doi.org/10.1002/sim.9551	DOI Listing

Publication Analysis

Top Keywords

propensity score

score estimation

logistic regression

sample cluster

cluster sizes

observational studies

nonparametric propensity

settings clustering

unmeasured cluster-level

cluster-level confounding

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!