TY - GEN
T1 - On Detecting Cloud Container Failures from Computing Utility Sequences
AU - Liu, Yu Shao
AU - Lai, Hsu Chao
AU - Huang, Jiun Long
AU - Chao, August F.Y.
N1 - Publisher Copyright:
© 2021 IEICE.
PY - 2021/9/8
Y1 - 2021/9/8
N2 - As the popularity of cloud platforms and container grows rapidly, managing clouds has become an important issue. For example, failed containers on cloud platforms would trigger automatic restart mechanism. However, the failed containers caused by user error are not fixable by restart, and may lead to the loop between failure and restart. Therefore, the looping failure will harm the overall performance of cloud. In this paper, we propose to identify possible container failures, where the utility behavior of containers (e.g., CPU usage, GPU usage, I/O throughput, etc) are factored in, in a machine learning approach. We propose a light-weight neural network EEGNet-SE to support fast inference in real-time. In addition, EEGNet-SE is able to distinguish dynamic relations between each utility for different tasks. We conduct a real cloud container dataset from Taiwan Cloud Computing (TWCC) platform. Experimental results manifest that EEGNet-SE boosts the performance and efficiency simultaneously, and outperforms the other state-of-the-art methods in terms of accuracy.
AB - As the popularity of cloud platforms and container grows rapidly, managing clouds has become an important issue. For example, failed containers on cloud platforms would trigger automatic restart mechanism. However, the failed containers caused by user error are not fixable by restart, and may lead to the loop between failure and restart. Therefore, the looping failure will harm the overall performance of cloud. In this paper, we propose to identify possible container failures, where the utility behavior of containers (e.g., CPU usage, GPU usage, I/O throughput, etc) are factored in, in a machine learning approach. We propose a light-weight neural network EEGNet-SE to support fast inference in real-time. In addition, EEGNet-SE is able to distinguish dynamic relations between each utility for different tasks. We conduct a real cloud container dataset from Taiwan Cloud Computing (TWCC) platform. Experimental results manifest that EEGNet-SE boosts the performance and efficiency simultaneously, and outperforms the other state-of-the-art methods in terms of accuracy.
KW - Cloud Container Failure Detection
KW - Multivariate Time-series Classification
KW - Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85118105108&partnerID=8YFLogxK
U2 - 10.23919/APNOMS52696.2021.9562640
DO - 10.23919/APNOMS52696.2021.9562640
M3 - Conference contribution
AN - SCOPUS:85118105108
T3 - 2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
SP - 358
EP - 361
BT - 2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
Y2 - 8 September 2021 through 10 September 2021
ER -