Fully Synthetic Data for Complex Surveys.

Surv Methodol

Department of Statistical Science, 214a Old Chemistry Building, Duke University, Durham, NC 27708-0251.

Published: December 2024

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate the repeated sampling properties of the combining rules via simulation studies, including comparisons with synthetic data generation based on pseudo-likelihood methods. We apply the proposed methods to a subset of data from the American Community Survey.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759325PMC

Publication Analysis

Top Keywords

synthetic data
20
simple random
16
fully synthetic
12
data
10
public files
8
random samples
8
multiple imputation
8
data generation
8
random sample
8
combining rules
8

Similar Publications

Rationale: Fentanyl and fentanyl analogs continue to pose a serious threat to the public health. The vast number of fentanyl analogs emerging on the black-market call for optimized analytical methods for the detection, analysis, and characterization of these extremely dangerous drugs.

Methods: Atmospheric pressure solids analysis probe (ASAP) mass spectrometry was used for the rapid analysis of 250 synthetic opioid standards, including 211 fentanyl analogs, 32 non-fentanyl related opioids, and 8 fentanyl precursors.

View Article and Find Full Text PDF

Development of a Highly Nutritious Vegetable Beverage Based on Kurugua (Sicana odorifera) and Chia Oil (Salvia hispanica).

Plant Foods Hum Nutr

January 2025

Facultad de Ciencias Químicas, Dirección de Investigaciones, Universidad Nacional de Asunción, P.O. 1055, San Lorenzo, Paraguay.

Concerns over malnutrition, synthetic additives and post-harvest waste highlight the need for innovation in food technology, turning towards underutilized crops. Plant-based beverages offer sustainable dietary alternatives and the increasing demand for such products makes the exploration of native crops particularly relevant. This study focuses on the development of a beverage derived from the native South American fruit kurugua (Sicana odorifera), combined with chia oil (Salvia hispanica L.

View Article and Find Full Text PDF

Structural maintenance of chromosomes (SMC) are ubiquitously distributed proteins involved in chromosome organization. Deletion of causes severe growth phenotypes in many organisms. Surprisingly, can be deleted in , a member of the phylum, without any apparent growth phenotype.

View Article and Find Full Text PDF

The use of synthetic data is a promising solution to facilitate the sharing and reuse of health-related data beyond its initial collection while addressing privacy concerns. However, there is still no consensus on a standardized approach for systematically evaluating the privacy and utility of synthetic data, impeding its broader adoption. In this work, we present a comprehensive review and systematization of current methods for evaluating synthetic health-related data, focusing on both privacy and utility aspects.

View Article and Find Full Text PDF

Recent barcoding technologies allow reconstructing lineage trees while capturing paired single-cell RNA-sequencing (scRNA-seq) data. Such datasets provide opportunities to compare gene expression memory maintenance through lineage branching and pinpoint critical genes in these processes. Here we develop Permutation, Optimization, and Representation learning based single Cell gene Expression and Lineage ANalysis (PORCELAN) to identify lineage-informative genes or subtrees where lineage and expression are tightly coupled.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!