In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387 | PMC |
http://dx.doi.org/10.3390/e25010088 | DOI Listing |
Biol Rev Camb Philos Soc
January 2025
Wildlife Observatory of Australia (WildObs), Queensland Cyber Infrastructure Foundation (QCIF), Brisbane, Queensland, 4072, Australia.
Camera traps are widely used in wildlife research and monitoring, so it is imperative to understand their strengths, limitations, and potential for increasing impact. We investigated a decade of use of wildlife cameras (2012-2022) with a case study on Australian terrestrial vertebrates using a multifaceted approach. We (i) synthesised information from a literature review; (ii) conducted an online questionnaire of 132 professionals; (iii) hosted an in-person workshop of 28 leading experts representing academia, non-governmental organisations (NGOs), and government; and (iv) mapped camera trap usage based on all sources.
View Article and Find Full Text PDFJ Med Internet Res
January 2025
Advanced Care Research Centre, Usher Institute, The University of Edinburgh, Edinburgh, United Kingdom.
Background: Clinical guideline development preferentially relies on evidence from randomized controlled trials (RCTs). RCTs are gold-standard methods to evaluate the efficacy of treatments with the highest internal validity but limited external validity, in the sense that their findings may not always be applicable to or generalizable to clinical populations or population characteristics. The external validity of RCTs for the clinical population is constrained by the lack of tailored epidemiological data analysis designed for this purpose due to data governance, consistency of disease or condition definitions, and reduplicated effort in analysis code.
View Article and Find Full Text PDFNan Fang Yi Ke Da Xue Xue Bao
January 2025
School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China.
Objectives: To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators.
Methods: Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation.
Comput Med Imaging Graph
December 2024
Nantes Université, Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France.
Diffuse Large B-cell Lymphoma (DLBCL) is a lymphatic cancer of steadily growing incidence. Its diagnostic and follow-up rely on the analysis of clinical biomarkers and 18F-Fluorodeoxyglucose (FDG)-PET/CT images. In this context, we target the problem of assisting in the early identification of high-risk DLBCL patients from both images and tabular clinical data.
View Article and Find Full Text PDFRev Gaucha Enferm
January 2025
RISE - Rede de Investigação em Saúde. Porto, Portugal.
Objective: To map the literature on the use of exergames in the rehabilitation of school-age children with brain tumors, in any context.
Method: Scoping review protocol developed using the recommendations of the Joanna Briggs Institute. The search will include aggregators, databases, indexes, repositories, and research browsers, without limitation as to the year of publication.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!