Purpose: To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis.
Methods: A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results.
Results: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%-99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates.
Conclusions: Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/pds.70019 | DOI Listing |
Appl Microbiol Biotechnol
December 2024
Life Sciences and Bioengineering Center, Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA, USA.
Transcriptomics is a powerful approach for functional genomics and systems biology, yet it can also be used for genetic part discovery. Here, we derive constitutive and light-regulated promoters directly from transcriptomics data of the basidiomycete red yeast Xanthophyllomyces dendrorhous CBS 6938 (anamorph Phaffia rhodozyma) and use these promoters with other genetic elements to create a modular synthetic biology parts collection for this organism. X.
View Article and Find Full Text PDFACS Appl Mater Interfaces
December 2024
Key Laboratory of Synthetic and Biological Colloids, Ministry of Education, School of Chemical and Material Engineering, Jiangnan University, 214122 Jiangsu, China.
Nanometric solid solution alloys are utilized in a broad range of fields, including catalysis, energy storage, medical application, and sensor technology. Unfortunately, the synthesis of these alloys becomes increasingly challenging as the disparity between the metal elements grows, due to differences in atomic sizes, melting points, and chemical affinities. This study utilized a data-driven approach incorporating sample balancing enhancement techniques and multilayer perceptron (MLP) algorithms to improve the model's ability to handle imbalanced data, significantly boosting the efficiency of experimental parameter optimization.
View Article and Find Full Text PDFJ Med Eng Technol
December 2024
Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
Nowadays, photoplethysmograph (PPG) technology is being used more often in smart devices and mobile phones due to advancements in information and communication technology in the health field, particularly in monitoring cardiac activities. Developing generative models to generate synthetic PPG signals requires overcoming challenges like data diversity and limited data available for training deep learning models. This paper proposes a generative model by adopting a genetic programming (GP) approach to generate increasingly diversified and accurate data using an initial PPG signal sample.
View Article and Find Full Text PDFJ Biol Eng
December 2024
Department of Chemical Engineering (BK21 FOUR Integrated Engineering), Kyung Hee University, Yongin-si, Gyeonggi-do, 17104, Republic of Korea.
The biological production of lipids presents a sustainable method for generating fuels and chemicals. Recognized as safe and enhanced by advanced synthetic biology and metabolic engineering tools, yeasts are becoming versatile hosts for industrial applications. However, lipids accumulate predominantly as triacylglycerides in yeasts, which are suboptimal for industrial uses.
View Article and Find Full Text PDFBMC Bioinformatics
December 2024
School of Computer Engineering, Jiangsu Ocean University, Lianyungang, 222005, China.
Background: Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!