An imbalanced dataset is commonly found in at least one class, which are typically exceeded by the other ones. A machine learning algorithm (classifier) trained with an imbalanced dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalanced dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance classifiers' performance. In this study, we consider an imbalanced dataset from an educational context. Initially, we examine all shortcomings regarding the classification of an imbalanced dataset. Then, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifiers is measured using the underlying information in their confusion matrices, such as accuracy, precision, recall, and F measure. The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class. The analysis confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7417470 | PMC |
http://dx.doi.org/10.1186/s42492-020-00055-9 | DOI Listing |
Comput Med Imaging Graph
January 2025
Sapienza University of Rome, Department of Computer Control and Management Engineering Antonio Ruberti, 00185, Rome, Italy. Electronic address:
Predicting the outcome of antiretroviral therapies (ART) for HIV-1 is a pressing clinical challenge, especially when the ART includes drugs with limited effectiveness data. This scarcity of data can arise either due to the introduction of a new drug to the market or due to limited use in clinical settings, resulting in clinical dataset with highly unbalanced therapy representation. To tackle this issue, we introduce a novel joint fusion model, which combines features from a Fully Connected (FC) Neural Network and a Graph Neural Network (GNN) in a multi-modality fashion.
View Article and Find Full Text PDFJ Med Internet Res
January 2025
Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center For Child Health, Hang Zhou, China.
Background: Accurate classification of patient complaints is crucial for enhancing patient satisfaction management in health care settings. Traditional manual methods for categorizing complaints often lack efficiency and precision. Thus, there is a growing demand for advanced and automated approaches to streamline the classification process.
View Article and Find Full Text PDFSensors (Basel)
December 2024
State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
Applying deep learning to unsupervised bearing fault diagnosis in complex industrial environments is challenging. Traditional fault detection methods rely on labeled data, which is costly and labor-intensive to obtain. This paper proposes a novel unsupervised approach, WDCAE-LKA, combining a wide kernel convolutional autoencoder (WDCAE) with a large kernel attention (LKA) mechanism to improve fault detection under unlabeled conditions, and the adaptive threshold module based on a multi-layer perceptron (MLP) dynamically adjusts thresholds, boosting model robustness in imbalanced scenarios.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Mechatronics and Vehicle Engineering, East China Jiaotong University, Nanchang 330013, China.
Data imbalances present a serious problem for intelligent fault diagnosis. They can lead to reduced diagnostic precision, which can jeopardize equipment reliability and safety. Based on that, this paper proposes a novel fault diagnosis method combining the denoising diffusion implicit model (DDIM) with a new convolutional neural network framework.
View Article and Find Full Text PDFHeliyon
December 2024
Computer Science & Engineering Department, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya (Technological University of Madhya Pradesh), Bhopal, Madhya Pradesh, India.
A smart city is deemed smart enough because it has the capability to make decisions on its own. Artificial intelligence needs a lot of data from the physical world to make correct decisions. IoT sensor devices collect data from the surroundings, which is further used for predictive analytics.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!