Synthetic data generation methods in healthcare: A review on open-source tools and methods.

Comput Struct Biotechnol J

Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina GR45110, Greece.

Published: December 2024

Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving multi-modal synthetic data generation were also explored. The type of method used for the synthetic data generation process was identified in each study and was categorized into statistical, probabilistic, machine learning, and deep learning. Emphasis was given to the programming languages used for the implementation of each method. Our evaluation revealed that the majority of the studies utilize synthetic data generators to: (i) reduce the cost and time required for clinical trials for rare diseases and conditions, (ii) enhance the predictive power of AI models in personalized medicine, (iii) ensure the delivery of fair treatment recommendations across diverse patient populations, and (iv) enable researchers to access high-quality, representative multimodal datasets without exposing sensitive patient information, among others. We underline the wide use of deep learning based synthetic data generators in 72.6 % of the included studies, with 75.3 % of the generators being implemented in Python. A thorough documentation of open-source repositories is finally provided to accelerate research in the field.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11301073PMC
http://dx.doi.org/10.1016/j.csbj.2024.07.005DOI Listing

Publication Analysis

Top Keywords

synthetic data
28
data generation
16
data
10
methods healthcare
8
deep learning
8
data generators
8
synthetic
7
generation
4
generation methods
4
healthcare review
4

Similar Publications

Diabetic foot, leg ulcers and decubitus ulcers affect millions of individuals worldwide leading to poor quality of life, pain and in several cases to limb amputations. Despite the global dimension of this clinical problem, limited progress has been made in developing more efficacious wound dressings, the design of which currently focusses on wound protection and control of its exudate volume. The present in vitro study systematically analysed seven types of clinically-available wound dressings made of different biomaterial composition and engineering.

View Article and Find Full Text PDF

Surface water plays a vital role in the spread of infectious diseases. Information on the spatial and temporal dynamics of surface water availability is thus critical to understanding, monitoring and forecasting disease outbreaks. Before the launch of Sentinel-1 Synthetic Aperture Radar (SAR) missions, surface water availability has been captured at various spatial scales through approaches based on optical remote sensing data.

View Article and Find Full Text PDF

Background: Stunting is a vital indicator of chronic undernutrition that reveals a failure to reach linear growth. Investigating growth and nutrition status during adolescence, in addition to infancy and childhood is very crucial. However, the available studies in Ethiopia have been usually focused in early childhood and they used the traditional stastical methods.

View Article and Find Full Text PDF

Anomaly detection is crucial in areas such as financial fraud identification, cybersecurity defense, and health monitoring, as it directly affects the accuracy and security of decision-making. Existing generative adversarial nets (GANs)-based anomaly detection methods overlook the importance of local density, limiting their effectiveness in detecting anomaly objects in complex data distributions. To address this challenge, we introduce a generative adversarial local density-based anomaly detection (GALD) method, which combines the data distribution modeling capabilities of GANs with local synthetic density analysis.

View Article and Find Full Text PDF

Previous research indicates that the COVID-19 pandemic catalyzed alterations in behaviors that may impact exposures to environmental endocrine-disrupting chemicals. This includes changes in the use of chemicals found in consumer products, food packaging, and exposure to air pollutants. Within the Environmental influences on Child Health Outcomes (ECHO) program, a national consortium initiated to understand the effects of environmental exposures on child health and development, our objective was to assess whether urinary concentrations of a wide range of potential endocrine-disrupting chemicals varied before and during the pandemic.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!