Balancing Inferential Integrity and Disclosure Risk via Model Targeted Masking and Multiple Imputation.

Bei Jiang Adrian E Raftery Russell J Steele Naisyin Wang

J Am Stat Assoc

Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.

Published: May 2021

There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, and so is the concern on data-privacy. A strategy to protect individuals' identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values (Rubin, 1993). However, information loss or incorrectly specified imputation models can weaken or invalidate the inferences obtained from the MI-datasets. Studying a restricted-use Canadian Scleroderma Research Group (CSRG) dataset, the authors investigate the use of a new masking framework with a data-augmentation (DA) component and a tuning mechanism that balances between protecting identity-disclosure and preserving data-utility. They found, respectively in a work-disability and an interstitial lung disease study, using this DA-MI strategy reached 0% identity disclosure-risk, preserved all inferential conclusions, and on average produced 98.5% and 95.5% confidence intervals (CI) overlaps when compared to the 95% CIs constructed using the generic CSGR dataset; the lowest CI-overlap value is 91%. In contrast, the same is not true for the currently used methods; with the CI-overlap values ranging from 73.9% to 91.8% and the lowest value being 28.1%. These findings indicate that the DA-MI masking framework facilitates sharing of useful research data while protecting participants' identities.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466287	PMC
http://dx.doi.org/10.1080/01621459.2021.1909597	DOI Listing

Publication Analysis

Top Keywords

masking framework

balancing inferential

inferential integrity

integrity disclosure

disclosure risk

risk model

model targeted

targeted masking

masking multiple

multiple imputation

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!