HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis.

Entropy (Basel)

Peng Cheng Laboratory, Department of New Networks, Shenzhen 518000, China.

Published: December 2022

In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387PMC
http://dx.doi.org/10.3390/e25010088DOI Listing

Publication Analysis

Top Keywords

tabular data
20
federated generative
12
data
12
data synthesis
12
ht-fed-gan federated
8
generative model
8
model decentralized
8
decentralized tabular
8
existing methods
8
real-world datasets
8

Similar Publications

Large-scale and long-term wildlife research and monitoring using camera traps: a continental synthesis.

Biol Rev Camb Philos Soc

January 2025

Wildlife Observatory of Australia (WildObs), Queensland Cyber Infrastructure Foundation (QCIF), Brisbane, Queensland, 4072, Australia.

Camera traps are widely used in wildlife research and monitoring, so it is imperative to understand their strengths, limitations, and potential for increasing impact. We investigated a decade of use of wildlife cameras (2012-2022) with a case study on Australian terrestrial vertebrates using a multifaceted approach. We (i) synthesised information from a literature review; (ii) conducted an online questionnaire of 132 professionals; (iii) hosted an in-person workshop of 28 leading experts representing academia, non-governmental organisations (NGOs), and government; and (iv) mapped camera trap usage based on all sources.

View Article and Find Full Text PDF

Background: Clinical guideline development preferentially relies on evidence from randomized controlled trials (RCTs). RCTs are gold-standard methods to evaluate the efficacy of treatments with the highest internal validity but limited external validity, in the sense that their findings may not always be applicable to or generalizable to clinical populations or population characteristics. The external validity of RCTs for the clinical population is constrained by the lack of tailored epidemiological data analysis designed for this purpose due to data governance, consistency of disease or condition definitions, and reduplicated effort in analysis code.

View Article and Find Full Text PDF

Objectives: To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators.

Methods: Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation.

View Article and Find Full Text PDF

Diffuse Large B-cell Lymphoma (DLBCL) is a lymphatic cancer of steadily growing incidence. Its diagnostic and follow-up rely on the analysis of clinical biomarkers and 18F-Fluorodeoxyglucose (FDG)-PET/CT images. In this context, we target the problem of assisting in the early identification of high-risk DLBCL patients from both images and tabular clinical data.

View Article and Find Full Text PDF

Objective: To map the literature on the use of exergames in the rehabilitation of school-age children with brain tumors, in any context.

Method: Scoping review protocol developed using the recommendations of the Joanna Briggs Institute. The search will include aggregators, databases, indexes, repositories, and research browsers, without limitation as to the year of publication.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!