TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers

Yu Yuan Liu, Hong Sheng Zheng, Yu Fang Hu, Chen Fong Hsu, Tsung Tai Yeh

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Deploying deep neural network (DNN) models on Microcontroller Units (MCUs) is typically limited by the tightness of the SRAM memory budget. Previously, machine learning system frameworks often allocated tensor memory layer-wise, but this will result in out-of-memory exceptions when a DNN model includes a large tensor. Patch-based inference, another past solution, reduces peak SRAM memory usage by dividing a tensor into small patches and storing one small patch at a time. However, executing these overlapping small patches requires significantly more time to complete the inference and is undesirable for MCUs. We resolve these problems by developing a novel DNN model compiler: TinyTS. In the TinyTS, our tensor partition method creates a tensor-splitting model that eliminates the redundant computation observed in the patch-based inference. Furthermore, the TinyTS memory planner significantly reduces peak SRAM memory usage by releasing the memory space of unused split tensors for other ready split tensors early before the completion of the entire tensor. Finally, TinyTS presents different optimization techniques to eliminate the metadata storage and runtime overhead when executing multiple fine-grained split tensors. Using the TensorFlow Lite for Microcontroller (TFLM) framework as a baseline, we tested the effectiveness of TinyTS. We found that TinyTS reduces the peak SRAM memory usage of 9 TinyML models up to 5.92X over the baseline. TinyTS also achieves a geometric mean of 8.83X speedup over the patch-based inference. In resolving the two key issues when deploying DNN models on MCUs, TinyTS substantially boosts memory usage efficiency for TinyML applications. The source code of TinyTS can be obtained from https://github.com/nycu-caslab/TinyTS

原文English
主出版物標題Proceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
發行者IEEE Computer Society
頁面848-860
頁數13
ISBN(電子)9798350393132
DOIs
出版狀態Published - 2024
事件30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 - Edinburgh, 英國
持續時間: 2 3月 20246 3月 2024

出版系列

名字Proceedings - International Symposium on High-Performance Computer Architecture
ISSN(列印)1530-0897

Conference

Conference30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
國家/地區英國
城市Edinburgh
期間2/03/246/03/24

指紋

深入研究「TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers」主題。共同形成了獨特的指紋。

引用此