Class imbalance is a common challenge that is often faced when dealing with classification tasks aiming to detect medical events that are particularly infrequent. Apnoea is an example of such events. This challenge can however be mitigated using class rebalancing algorithms. This work investigated 10 widely used data-level class imbalance mitigation methods aiming towards building a random forest (RF) model that attempts to detect apnoea events from photoplethysmography (PPG) signals acquired from the neck. Those methods are random undersampling (RandUS), random oversampling (RandOS), condensed nearest-neighbors (CNNUS), edited nearest-neighbors (ENNUS), Tomek's links (TomekUS), synthetic minority oversampling technique (SMOTE), Borderline-SMOTE (BLSMOTE), adaptive synthetic oversampling (ADASYN), SMOTE with TomekUS (SMOTETomek) and SMOTE with ENNUS (SMOTEENN). Feature-space transformation using PCA and KernelPCA was also examined as a potential way of providing better representations of the data for the class rebalancing methods to operate. This work showed that RandUS is the best option for improving the sensitivity score (up to 11%). However, it could hinder the overall accuracy due to the reduced amount of training data. On the other hand, augmenting the data with new artificial data points was shown to be a non-trivial task that needs further development, especially in the presence of subject dependencies, as was the case in this work.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11002073 | PMC |
http://dx.doi.org/10.3389/fdgth.2024.1377165 | DOI Listing |
PLoS One
December 2024
Department of Industrial & Management Engineering, Korea National University of Transportation, Chungju, South Korea.
Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability.
View Article and Find Full Text PDFPLoS One
December 2024
Department of Cardiology, The People's Hospital of China Medical University, The People's Hospital of Liaoning Province, Shenyang, China.
Background: Acute myocardial infarction (AMI) remains a leading cause of hospitalization and death in China. Accurate mortality prediction of inpatient is crucial for clinical decision-making of non-ST-segment elevation myocardial infarction (NSTEMI) patients.
Methods: In this study, a total of 3061 patients between January 1, 2017 and December 31, 2022 diagnosed with NSTEMI were enrolled in this study.
PLoS One
December 2024
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan.
Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability.
View Article and Find Full Text PDFSci Rep
December 2024
INRAE, CNRS, Université de Tours, PRC, Nouzilly, 37380, France.
Ovaries are of paramount importance in reproduction as they produce female gametes through a complex developmental process known as folliculogenesis. In the prospect of better understanding the mechanisms of folliculogenesis and of developing novel pharmacological approaches to control it, it is important to accurately and quantitatively assess the later stages of ovarian folliculogenesis (i.e.
View Article and Find Full Text PDFSci Rep
December 2024
School of Information and Control Engineering, Jilin University of Chemical Technology, Jilin, 132022, Jinlin, China.
When utilizing convolutional neural networks for wheat disease identification, the training phase typically requires a substantial amount of labeled data. However, labeling data is both complex and costly. Additionally, the model's recognition performance is often disrupted by complex factors in natural environments.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!