TY - JOUR
T1 - StreamNet
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
AU - Zheng, Hong Sheng
AU - Hsu, Chen Fong
AU - Liu, Yu Yuan
AU - Yeh, Tsung Tai
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - With the emerging Tiny Machine Learning (TinyML) inference applications, there is a growing interest when deploying TinyML models on the low-power Microcontroller Unit (MCU).However, deploying TinyML models on MCUs reveals several challenges due to the MCU's resource constraints, such as small flash memory, tight SRAM memory budget, and slow CPU performance.Unlike typical layer-wise inference, patch-based inference reduces the peak usage of SRAM memory on MCUs by saving small patches rather than the entire tensor in the SRAM memory.However, the processing of patch-based inference tremendously increases the amount of MACs against the layer-wise method.Thus, this notoriously computational overhead makes patch-based inference undesirable on MCUs.This work designs StreamNet that employs the stream buffer to eliminate the redundant computation of patch-based inference.StreamNet uses 1D and 2D streaming processing and provides an parameter selection algorithm that automatically improve the performance of patch-based inference with minimal requirements on the MCU's SRAM memory space.In 10 TinyML models, StreamNet-2D achieves a geometric mean of 7.3X speedup and saves 81% of MACs over the state-of-the-art patch-based inference.
AB - With the emerging Tiny Machine Learning (TinyML) inference applications, there is a growing interest when deploying TinyML models on the low-power Microcontroller Unit (MCU).However, deploying TinyML models on MCUs reveals several challenges due to the MCU's resource constraints, such as small flash memory, tight SRAM memory budget, and slow CPU performance.Unlike typical layer-wise inference, patch-based inference reduces the peak usage of SRAM memory on MCUs by saving small patches rather than the entire tensor in the SRAM memory.However, the processing of patch-based inference tremendously increases the amount of MACs against the layer-wise method.Thus, this notoriously computational overhead makes patch-based inference undesirable on MCUs.This work designs StreamNet that employs the stream buffer to eliminate the redundant computation of patch-based inference.StreamNet uses 1D and 2D streaming processing and provides an parameter selection algorithm that automatically improve the performance of patch-based inference with minimal requirements on the MCU's SRAM memory space.In 10 TinyML models, StreamNet-2D achieves a geometric mean of 7.3X speedup and saves 81% of MACs over the state-of-the-art patch-based inference.
UR - http://www.scopus.com/inward/record.url?scp=85191155173&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85191155173
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 10 December 2023 through 16 December 2023
ER -