TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers

Yu Yuan Liu, Hong Sheng Zheng, Yu Fang Hu, Chen Fong Hsu, Tsung Tai Yeh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deploying deep neural network (DNN) models on Microcontroller Units (MCUs) is typically limited by the tightness of the SRAM memory budget. Previously, machine learning system frameworks often allocated tensor memory layer-wise, but this will result in out-of-memory exceptions when a DNN model includes a large tensor. Patch-based inference, another past solution, reduces peak SRAM memory usage by dividing a tensor into small patches and storing one small patch at a time. However, executing these overlapping small patches requires significantly more time to complete the inference and is undesirable for MCUs. We resolve these problems by developing a novel DNN model compiler: TinyTS. In the TinyTS, our tensor partition method creates a tensor-splitting model that eliminates the redundant computation observed in the patch-based inference. Furthermore, the TinyTS memory planner significantly reduces peak SRAM memory usage by releasing the memory space of unused split tensors for other ready split tensors early before the completion of the entire tensor. Finally, TinyTS presents different optimization techniques to eliminate the metadata storage and runtime overhead when executing multiple fine-grained split tensors. Using the TensorFlow Lite for Microcontroller (TFLM) framework as a baseline, we tested the effectiveness of TinyTS. We found that TinyTS reduces the peak SRAM memory usage of 9 TinyML models up to 5.92X over the baseline. TinyTS also achieves a geometric mean of 8.83X speedup over the patch-based inference. In resolving the two key issues when deploying DNN models on MCUs, TinyTS substantially boosts memory usage efficiency for TinyML applications. The source code of TinyTS can be obtained from https://github.com/nycu-caslab/TinyTS

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
PublisherIEEE Computer Society
Pages848-860
Number of pages13
ISBN (Electronic)9798350393132
DOIs
StatePublished - 2024
Event30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 - Edinburgh, United Kingdom
Duration: 2 Mar 20246 Mar 2024

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Conference

Conference30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
Country/TerritoryUnited Kingdom
CityEdinburgh
Period2/03/246/03/24

Keywords

  • AIoT
  • Compiler
  • Deep Neural Network
  • TinyML

Fingerprint

Dive into the research topics of 'TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers'. Together they form a unique fingerprint.

Cite this