TY - JOUR
T1 - Two-stage multi-datasource machine learning for attack technique and lifecycle detection
AU - Lin, Ying Dar
AU - Yang, Shin Yi
AU - Sudyana, Didik
AU - Yudha, Fietyata
AU - Lai, Yuan Cheng
AU - Hwang, Ren Hung
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/7
Y1 - 2024/7
N2 - Intrusion detection systems (IDS) have increasingly adopted machine learning (ML) techniques to enhance their ability to detect a wide range of attack variants. However, the traditional focus in current research primarily revolves around identifying specific attack types or techniques using a single data source. However, this approach lacks a holistic perspective on attacks, which can result in missed detections. To improve the effectiveness of responding to detected attacks, it is essential to identify them based on their lifecycles and incorporate information from multiple data sources. In this study, we present three distinct approaches for detecting attack lifecycles, each leveraging different ML methodologies: a single-stage ML model, a two-stage ML+ML approach, and ML with sequence matching (ML+SM). Simultaneously, we explore the benefits of utilizing multiple data sources, including network traffic, system logs, and host statistics, to enhance technique detection capabilities. Our evaluation of these methods reveals that on lifecycle detection, the two-stage ML+ML approach outperforms the others, achieving an impressive F1 score of 0.994. In contrast, the single-stage and ML+SM methods yield F1 scores of 0.887 and 0.189, respectively. Furthermore, the integration of multiple data sources proves highly advantageous, with the combination of all three sources yielding the highest F1 score of 0.922 on technique detection.
AB - Intrusion detection systems (IDS) have increasingly adopted machine learning (ML) techniques to enhance their ability to detect a wide range of attack variants. However, the traditional focus in current research primarily revolves around identifying specific attack types or techniques using a single data source. However, this approach lacks a holistic perspective on attacks, which can result in missed detections. To improve the effectiveness of responding to detected attacks, it is essential to identify them based on their lifecycles and incorporate information from multiple data sources. In this study, we present three distinct approaches for detecting attack lifecycles, each leveraging different ML methodologies: a single-stage ML model, a two-stage ML+ML approach, and ML with sequence matching (ML+SM). Simultaneously, we explore the benefits of utilizing multiple data sources, including network traffic, system logs, and host statistics, to enhance technique detection capabilities. Our evaluation of these methods reveals that on lifecycle detection, the two-stage ML+ML approach outperforms the others, achieving an impressive F1 score of 0.994. In contrast, the single-stage and ML+SM methods yield F1 scores of 0.887 and 0.189, respectively. Furthermore, the integration of multiple data sources proves highly advantageous, with the combination of all three sources yielding the highest F1 score of 0.922 on technique detection.
KW - Attack lifecycle detection
KW - Ml-based IDS
KW - Multi-datasource IDS
KW - Two-stage lifecycle detection
UR - http://www.scopus.com/inward/record.url?scp=85191436797&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2024.103859
DO - 10.1016/j.cose.2024.103859
M3 - Article
AN - SCOPUS:85191436797
SN - 0167-4048
VL - 142
JO - Computers and Security
JF - Computers and Security
M1 - 103859
ER -