On Detecting Cloud Container Failures from Computing Utility Sequences

Yu Shao Liu, Hsu Chao Lai, Jiun Long Huang, August F.Y. Chao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As the popularity of cloud platforms and container grows rapidly, managing clouds has become an important issue. For example, failed containers on cloud platforms would trigger automatic restart mechanism. However, the failed containers caused by user error are not fixable by restart, and may lead to the loop between failure and restart. Therefore, the looping failure will harm the overall performance of cloud. In this paper, we propose to identify possible container failures, where the utility behavior of containers (e.g., CPU usage, GPU usage, I/O throughput, etc) are factored in, in a machine learning approach. We propose a light-weight neural network EEGNet-SE to support fast inference in real-time. In addition, EEGNet-SE is able to distinguish dynamic relations between each utility for different tasks. We conduct a real cloud container dataset from Taiwan Cloud Computing (TWCC) platform. Experimental results manifest that EEGNet-SE boosts the performance and efficiency simultaneously, and outperforms the other state-of-the-art methods in terms of accuracy.

Original languageEnglish
Title of host publication2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages358-361
Number of pages4
ISBN (Electronic)9784885523328
DOIs
StatePublished - 8 Sep 2021
Event22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021 - Virtual, Online, Taiwan
Duration: 8 Sep 202110 Sep 2021

Publication series

Name2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021

Conference

Conference22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
Country/TerritoryTaiwan
CityVirtual, Online
Period8/09/2110/09/21

Keywords

  • Cloud Container Failure Detection
  • Multivariate Time-series Classification
  • Neural Network

Fingerprint

Dive into the research topics of 'On Detecting Cloud Container Failures from Computing Utility Sequences'. Together they form a unique fingerprint.

Cite this