Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

Ying Dar Lin, Zi Qiang Liu, Ren Hung Hwang*, Van Linh Nguyen, Po Ching Lin, Yuan Cheng Lai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a machine learning framework with a combination of a variational autoencoder and multilayer perceptron model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient range-based sequential search algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.

Original languageEnglish
Pages (from-to)15247-15260
Number of pages14
JournalIEEE Access
StatePublished - 2022


  • Imbalanced dataset
  • Intrusion detection
  • Machine learning
  • Variational autoencoder


Dive into the research topics of 'Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection'. Together they form a unique fingerprint.

Cite this