A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data.

Sci Rep

Department of Industrial Engineering, Sharif University of Technology, 9414 Azadi Ave, P.O. Box 11155, Tehran, 1458889694, Iran.

Published: March 2024

In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10908853PMC
http://dx.doi.org/10.1038/s41598-024-55598-1DOI Listing

Publication Analysis

Top Keywords

smote both-sampling
8
both-sampling csbboost
8
csbboost ensemble
8
ensemble algorithm
8
classifying imbalanced
8
imbalanced data
8
proposed algorithm
8
competing algorithms
8
algorithm
6
cluster-based smote
4

Similar Publications

A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data.

Sci Rep

March 2024

Department of Industrial Engineering, Sharif University of Technology, 9414 Azadi Ave, P.O. Box 11155, Tehran, 1458889694, Iran.

In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!