Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

Ying Dar Lin, Zi Qiang Liu, Ren Hung Hwang*, Van Linh Nguyen, Po Ching Lin, Yuan Cheng Lai


研究成果: Article同行評審

5 引文 斯高帕斯(Scopus)


As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a machine learning framework with a combination of a variational autoencoder and multilayer perceptron model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient range-based sequential search algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.

頁(從 - 到)15247-15260
期刊IEEE Access
出版狀態Published - 2022


深入研究「Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection」主題。共同形成了獨特的指紋。