Root Cause Analysis in Microservice Using Neural Granger Causal Discovery

Cheng Ming Lin, Ching Chang, Wei Yao Wang, Kuang Da Wang, Wen Chih Peng

研究成果: Conference contribution同行評審

4 引文 斯高帕斯(Scopus)

摘要

In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to establish causal relationships and derive root causes from causal graphs. Nevertheless, they ignored the temporal order of time series data and failed to leverage the rich information inherent in the temporal relationships. For instance, in cases where there is a sudden spike in CPU utilization, it can lead to an increase in latency for other microservices. However, in this scenario, the anomaly in CPU utilization occurs before the latency increase, rather than simultaneously. As a result, the PC-algorithm fails to capture such characteristics. To address these challenges, we propose RUN, a novel approach for root cause analysis using neural Granger causal discovery with contrastive learning. RUN enhances the backbone encoder by integrating contextual information from time series, and leverages a time series forecasting model to conduct neural Granger causal discovery. In addition, RUN incorporates Pagerank with a personalization vector to efficiently recommend the top-k root causes. Extensive experiments conducted on the synthetic and real-world microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods. Moreover, we provide an analysis scenario for the sock-shop case to showcase the practicality and efficacy of RUN in microservice-based applications. Our code is publicly available at https://github.com/zmlin1998/RUN.

原文English
主出版物標題Technical Tracks 14
編輯Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
發行者Association for the Advancement of Artificial Intelligence
頁面206-213
頁數8
版本1
ISBN(電子)1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOIs
出版狀態Published - 25 3月 2024
事件38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, 加拿大
持續時間: 20 2月 202427 2月 2024

出版系列

名字Proceedings of the AAAI Conference on Artificial Intelligence
號碼1
38
ISSN(列印)2159-5399
ISSN(電子)2374-3468

Conference

Conference38th AAAI Conference on Artificial Intelligence, AAAI 2024
國家/地區加拿大
城市Vancouver
期間20/02/2427/02/24

指紋

深入研究「Root Cause Analysis in Microservice Using Neural Granger Causal Discovery」主題。共同形成了獨特的指紋。

引用此