Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection.

Comput Intell Neurosci

School of Software, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea.

Published: March 2019

AI Article Synopsis

  • The KDD CUP 1999 intrusion detection dataset is a well-known dataset used for studying different types of cyber attacks, categorized into U2R, R2L, DoS, and Probe, with a 'normal' class added.
  • The U2R, R2L, and Probe classes are considered rare, making up less than 1% of the dataset, leading to class imbalance issues.
  • This study focuses on optimizing the synthetic minority oversampling technique (SMOTE) ratios for these rare classes using support vector regression, resulting in improved performance of machine-learning techniques compared to previous methods.

Article Abstract

The KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (R2L), denial of service (DoS), and Probe. We use five classes by adding the normal class. We define the U2R, R2L, and Probe classes, which are each less than 1% of the total dataset, as rare classes. In this study, we attempt to mitigate the class imbalance of the dataset. Using the synthetic minority oversampling technique (SMOTE), we attempted to optimize the SMOTE ratios for the rare classes (U2R, R2L, and Probe). After randomly generating a number of tuples of SMOTE ratios, these tuples were used to create a numerical model for optimizing the SMOTE ratios of the rare classes. The support vector regression was used to create the model. We assigned each instance in the test dataset to the model and chose the best SMOTE ratios. The experiments using machine-learning techniques were conducted using the best ratios. The results using the proposed method were significantly better than those of previous approach and other related work.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236522PMC
http://dx.doi.org/10.1155/2018/9704672DOI Listing

Publication Analysis

Top Keywords

smote ratios
16
rare classes
12
optimize smote
8
class imbalance
8
imbalance dataset
8
intrusion detection
8
kdd cup
8
cup 1999
8
probe classes
8
u2r r2l
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!