KCO: Balancing class distribution in just-in-time software defect prediction using kernel crossover oversampling.

PLoS One

Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia.

Published: April 2024

The performance of the defect prediction model by using balanced and imbalanced datasets makes a big impact on the discovery of future defects. Current resampling techniques only address the imbalanced datasets without taking into consideration redundancy and noise inherent to the imbalanced datasets. To address the imbalance issue, we propose Kernel Crossover Oversampling (KCO), an oversampling technique based on kernel analysis and crossover interpolation. Specifically, the proposed technique aims to generate balanced datasets by increasing data diversity in order to reduce redundancy and noise. KCO first represents multidimensional features into two-dimensional features by employing Kernel Principal Component Analysis (KPCA). KCO then divides the plotted data distribution by deploying spectral clustering to select the best region for interpolation. Lastly, KCO generates the new defect data by interpolating different data templates within the selected data clusters. According to the prediction evaluation conducted, KCO consistently produced F-scores ranging from 21% to 63% across six datasets, on average. According to the experimental results presented in this study, KCO provides more effective prediction performance than other baseline techniques. The experimental results show that KCO within project and cross project predictions especially consistently achieve higher performance of F-score results.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11008885PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0299585PLOS

Publication Analysis

Top Keywords

imbalanced datasets
12
kco
8
defect prediction
8
kernel crossover
8
crossover oversampling
8
redundancy noise
8
datasets
5
data
5
kco balancing
4
balancing class
4

Similar Publications

Adaptive ensemble loss and multi-scale attention in breast ultrasound segmentation with UMA-Net.

Med Biol Eng Comput

January 2025

Artificial Intelligence Lab, School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, India.

The generalization of deep learning (DL) models is critical for accurate lesion segmentation in breast ultrasound (BUS) images. Traditional DL models often struggle to generalize well due to the high frequency and scale variations inherent in BUS images. Moreover, conventional loss functions used in these models frequently result in imbalanced optimization, either prioritizing region overlap or boundary accuracy, which leads to suboptimal segmentation performance.

View Article and Find Full Text PDF

Deep CNN ResNet-18 based model with attention and transfer learning for Alzheimer's disease detection.

Front Neuroinform

January 2025

Department of Computer Science and Engineering, Institute of Technology, Nirma University, Gujarat, India.

Introduction: The prevalence of age-related brain issues has risen in developed countries because of changes in lifestyle. Alzheimer's disease leads to a rapid and irreversible decline in cognitive abilities by damaging memory cells.

Methods: A ResNet-18-based system is proposed, integrating Depth Convolution with a Squeeze and Excitation (SE) block to minimize tuning parameters.

View Article and Find Full Text PDF

Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets.

Front Artif Intell

January 2025

Department of Computer and Automatic Control, Faculty of Engineering, Tanta University, Tanta, Egypt.

Introduction: Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.

Methods: A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed.

View Article and Find Full Text PDF

Existing studies indicate that dysregulation or abnormal expression of small nucleolar RNA (snoRNA) is closely associated with various diseases, including lung cancer. Furthermore, these diseases often involve multiple targets, making the redevelopment of traditional medicines highly promising. Accurate prediction of potential snoRNA therapeutic targets is essential for early disease intervention and the redevelopment of traditional medicines.

View Article and Find Full Text PDF

Cancer, as a shocking disease, is one of the most common malignant tumors among women, posing a huge threat to the physical health and safety of women worldwide. With the continuous development of science and technology, more and more high and new technologies are involved in the diagnosis and prediction of breast cancer. In recent years, intelligent medical assistants supported by data mining and machine learning algorithms have provided necessary support for doctors' diagnosis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!