Fully Synthetic Data for Complex Surveys.

Shirley Mathur Yajuan Si Jerome P Reiter

Surv Methodol

Department of Statistical Science, 214a Old Chemistry Building, Duke University, Durham, NC 27708-0251.

Published: December 2024

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate the repeated sampling properties of the combining rules via simulation studies, including comparisons with synthetic data generation based on pseudo-likelihood methods. We apply the proposed methods to a subset of data from the American Community Survey.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759325	PMC

Publication Analysis

Top Keywords

synthetic data

simple random

fully synthetic

data

public files

random samples

multiple imputation

data generation

random sample

combining rules

Similar Publications

Fentanyl and Fentanyl Analog Screening Using ASAP-MS With LiveID Confirmation.

Rapid Commun Mass Spectrom

May 2025

Department of Chemistry, The University of North Texas, Denton, Texas, USA.

Karen A Reyes Monroy Rachel Koerber Guido F Verbeck

Rationale: Fentanyl and fentanyl analogs continue to pose a serious threat to the public health. The vast number of fentanyl analogs emerging on the black-market call for optimized analytical methods for the detection, analysis, and characterization of these extremely dangerous drugs.

Methods: Atmospheric pressure solids analysis probe (ASAP) mass spectrometry was used for the rapid analysis of 250 synthetic opioid standards, including 211 fentanyl analogs, 32 non-fentanyl related opioids, and 8 fentanyl precursors.

View Article and Find Full Text PDF

Similar Publications

Development of a Highly Nutritious Vegetable Beverage Based on Kurugua (Sicana odorifera) and Chia Oil (Salvia hispanica).

Plant Foods Hum Nutr

January 2025

Facultad de Ciencias Químicas, Dirección de Investigaciones, Universidad Nacional de Asunción, P.O. 1055, San Lorenzo, Paraguay.

Eva Coronel Marcela Martínez Edgardo Calandri Rocío Villalba Laura Correa

Concerns over malnutrition, synthetic additives and post-harvest waste highlight the need for innovation in food technology, turning towards underutilized crops. Plant-based beverages offer sustainable dietary alternatives and the increasing demand for such products makes the exploration of native crops particularly relevant. This study focuses on the development of a beverage derived from the native South American fruit kurugua (Sicana odorifera), combined with chia oil (Salvia hispanica L.

View Article and Find Full Text PDF

Similar Publications

Early onset of septal FtsK localization allows for efficient DNA segregation in SMC-deleted strains.

mBio

January 2025

Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.

Feng Peng Giacomo Giacomelli Fabian Meyer Marten Linder Markus Haak

Structural maintenance of chromosomes (SMC) are ubiquitously distributed proteins involved in chromosome organization. Deletion of causes severe growth phenotypes in many organisms. Surprisingly, can be deleted in , a member of the phylum, without any apparent growth phenotype.

View Article and Find Full Text PDF

Similar Publications

A scoping review of privacy and utility metrics in medical synthetic data.

NPJ Digit Med

January 2025

Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.

Bayrem Kaabachi Jérémie Despraz Thierry Meurers Karen Otte Mehmed Halilovic

The use of synthetic data is a promising solution to facilitate the sharing and reuse of health-related data beyond its initial collection while addressing privacy concerns. However, there is still no consensus on a standardized approach for systematically evaluating the privacy and utility of synthetic data, impeding its broader adoption. In this work, we present a comprehensive review and systematization of current methods for evaluating synthetic health-related data, focusing on both privacy and utility aspects.

View Article and Find Full Text PDF

Similar Publications

Integrating representation learning, permutation, and optimization to detect lineage-related gene expression patterns.

Nat Commun

January 2025

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA.

Hannah M Schlüter Caroline Uhler

Recent barcoding technologies allow reconstructing lineage trees while capturing paired single-cell RNA-sequencing (scRNA-seq) data. Such datasets provide opportunities to compare gene expression memory maintenance through lineage branching and pinpoint critical genes in these processes. Here we develop Permutation, Optimization, and Representation learning based single Cell gene Expression and Lineage ANalysis (PORCELAN) to identify lineage-informative genes or subtrees where lineage and expression are tightly coupled.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!