Enhancing supervised analysis of imbalanced untargeted metabolomics datasets using a CWGAN-GP framework for data augmentation.

Comput Biol Med

FT-ICR and Structural Mass Spectrometry Laboratory, Faculdade de Ciências, Universidade de Lisboa, Portugal; Biosystems and Integrative Sciences Institute (BioISI), Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal. Electronic address:

Published: January 2025

Untargeted metabolomics is an extremely useful approach for the discrimination of biological systems and biomarker identification. However, data analysis workflows are complex and face many challenges. Two of these challenges are the demand of high sample size and the possibility of severe class imbalance, which is particularly common in clinical studies. The latter can make statistical models less generalizable, increase the risk of overfitting and skew the analysis in favour of the majority class. One possible approach to mitigate this problem is data augmentation. However, the use of artificial data requires adequate data augmentation methods and criteria for assessing the quality of the generated data. In this work, we used Conditional Wasserstein Generative Adversarial Networks with Gradient Penalty (CWGAN-GPs) for data augmentation of metabolomics data. Using a set of benchmark datasets, we applied several criteria for the evaluation of the quality of generated data and assessed the performance of supervised predictive models trained with datasets that included such data. CWGAN-GP models generated realistic data with identical characteristics to real samples, mostly avoiding mode collapse. Furthermore, in cases of class imbalance, the performance of predictive models improved by supplementing the minority class with generated samples. This is evident for high quality datasets with well separated classes. Conversely, model improvements were quite modest for high class overlap datasets. This trend was confirmed by using synthetic datasets with different class separation levels. Data augmentation is a viable procedure to alleviate class imbalance problems but is not universally beneficial in metabolomics.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2024.109414DOI Listing

Publication Analysis

Top Keywords

data augmentation
20
data
12
class imbalance
12
untargeted metabolomics
8
quality generated
8
generated data
8
predictive models
8
class
7
datasets
6
augmentation
5

Similar Publications

Importance: Obsessive-compulsive and related disorders (OCRDs) encompass various neuropsychiatric conditions that cause significant distress and impair daily functioning. Although standard treatments are often effective, approximately 60% of patients may not respond adequately, underscoring the need for novel therapeutic approaches.

Objective: To evaluate improvement in OCRD symptoms associated with glutamatergic medications as monotherapy or as augmentation to selective serotonin reuptake inhibitors, with a focus on double-blind, placebo-controlled randomized clinical trials (RCTs).

View Article and Find Full Text PDF

Purpose: To quantify outer retina structural changes and define novel biomarkers of inherited retinal degeneration associated with biallelic mutations in RPE65 (RPE65-IRD) in patients before and after subretinal gene augmentation therapy with voretigene neparvovec (Luxturna).

Methods: Application of advanced deep learning for automated retinal layer segmentation, specifically tailored for RPE65-IRD. Quantification of five novel biomarkers for the ellipsoid zone (EZ): thickness, granularity, reflectivity, and intensity.

View Article and Find Full Text PDF

Partial-thickness rotator cuff tears (PTRCTs) are a common source of shoulder pathology, both in the aging population and in younger overhead athletes. Advanced imaging modalities used currently have led to increases in recognition, diagnosis, and treatment of these tears. The anatomy, five-layer histology, and relationship to the Ellman classification of PTRCTs have been well studied, with recent interest in radiographic predictors, such as the critical shoulder angle and acromial index.

View Article and Find Full Text PDF

Injury Patterns in Academy-Level Male Youth Soccer Players: A 3-Season Prospective Cohort Study.

Clin J Sport Med

January 2025

Department of Orthopaedic Surgery and Sports Medicine, Children's Mercy, Kansas City, Missouri; and.

Objective: To report injury epidemiology in youth male academy-level athletes in the United States.

Design: An observational study on injury occurrences and playing time over the 2019 to 2020, 2020 to 2021, and 2021 to 2022 soccer seasons.

Setting: Data collected from a single midwestern soccer academy in the United States in partnership with a tertiary care level I pediatric heath institution.

View Article and Find Full Text PDF

Lack of timely prognosis of cardiovascular condition (CVC) is resulting in increased mortality across the globe. Currently, available techniques are confined to medical facilities and need the intervention of specialists. Frequently, this impedes timely treatment, driven by socioeconomic factors.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!