A comparative study in class imbalance mitigation when working with physiological signals.

Front Digit Health

Wearable Technologies Lab, Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom.

Published: March 2024

Class imbalance is a common challenge that is often faced when dealing with classification tasks aiming to detect medical events that are particularly infrequent. Apnoea is an example of such events. This challenge can however be mitigated using class rebalancing algorithms. This work investigated 10 widely used data-level class imbalance mitigation methods aiming towards building a random forest (RF) model that attempts to detect apnoea events from photoplethysmography (PPG) signals acquired from the neck. Those methods are random undersampling (RandUS), random oversampling (RandOS), condensed nearest-neighbors (CNNUS), edited nearest-neighbors (ENNUS), Tomek's links (TomekUS), synthetic minority oversampling technique (SMOTE), Borderline-SMOTE (BLSMOTE), adaptive synthetic oversampling (ADASYN), SMOTE with TomekUS (SMOTETomek) and SMOTE with ENNUS (SMOTEENN). Feature-space transformation using PCA and KernelPCA was also examined as a potential way of providing better representations of the data for the class rebalancing methods to operate. This work showed that RandUS is the best option for improving the sensitivity score (up to 11%). However, it could hinder the overall accuracy due to the reduced amount of training data. On the other hand, augmenting the data with new artificial data points was shown to be a non-trivial task that needs further development, especially in the presence of subject dependencies, as was the case in this work.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11002073PMC
http://dx.doi.org/10.3389/fdgth.2024.1377165DOI Listing

Publication Analysis

Top Keywords

class imbalance
12
imbalance mitigation
8
class rebalancing
8
class
5
comparative study
4
study class
4
mitigation working
4
working physiological
4
physiological signals
4
signals class
4

Similar Publications

NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.

PLoS One

December 2024

Department of Industrial & Management Engineering, Korea National University of Transportation, Chungju, South Korea.

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability.

View Article and Find Full Text PDF

Mortality prediction of inpatients with NSTEMI in a premier hospital in China based on stacking model.

PLoS One

December 2024

Department of Cardiology, The People's Hospital of China Medical University, The People's Hospital of Liaoning Province, Shenyang, China.

Background: Acute myocardial infarction (AMI) remains a leading cause of hospitalization and death in China. Accurate mortality prediction of inpatient is crucial for clinical decision-making of non-ST-segment elevation myocardial infarction (NSTEMI) patients.

Methods: In this study, a total of 3061 patients between January 1, 2017 and December 31, 2022 diagnosed with NSTEMI were enrolled in this study.

View Article and Find Full Text PDF

Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability.

View Article and Find Full Text PDF

Ovaries are of paramount importance in reproduction as they produce female gametes through a complex developmental process known as folliculogenesis. In the prospect of better understanding the mechanisms of folliculogenesis and of developing novel pharmacological approaches to control it, it is important to accurately and quantitatively assess the later stages of ovarian folliculogenesis (i.e.

View Article and Find Full Text PDF

When utilizing convolutional neural networks for wheat disease identification, the training phase typically requires a substantial amount of labeled data. However, labeling data is both complex and costly. Additionally, the model's recognition performance is often disrupted by complex factors in natural environments.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!