On Detecting Cloud Container Failures from Computing Utility Sequences

Yu Shao Liu, Hsu Chao Lai, Jiun Long Huang, August F.Y. Chao

研究成果: Conference contribution同行評審

摘要

As the popularity of cloud platforms and container grows rapidly, managing clouds has become an important issue. For example, failed containers on cloud platforms would trigger automatic restart mechanism. However, the failed containers caused by user error are not fixable by restart, and may lead to the loop between failure and restart. Therefore, the looping failure will harm the overall performance of cloud. In this paper, we propose to identify possible container failures, where the utility behavior of containers (e.g., CPU usage, GPU usage, I/O throughput, etc) are factored in, in a machine learning approach. We propose a light-weight neural network EEGNet-SE to support fast inference in real-time. In addition, EEGNet-SE is able to distinguish dynamic relations between each utility for different tasks. We conduct a real cloud container dataset from Taiwan Cloud Computing (TWCC) platform. Experimental results manifest that EEGNet-SE boosts the performance and efficiency simultaneously, and outperforms the other state-of-the-art methods in terms of accuracy.

原文English
主出版物標題2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
發行者Institute of Electrical and Electronics Engineers Inc.
頁面358-361
頁數4
ISBN(電子)9784885523328
DOIs
出版狀態Published - 8 9月 2021
事件22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021 - Virtual, Online, 台灣
持續時間: 8 9月 202110 9月 2021

出版系列

名字2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021

Conference

Conference22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
國家/地區台灣
城市Virtual, Online
期間8/09/2110/09/21

指紋

深入研究「On Detecting Cloud Container Failures from Computing Utility Sequences」主題。共同形成了獨特的指紋。

引用此