TY - GEN
T1 - An Efficient and Low-Power MLP Accelerator Architecture Supporting Structured Pruning, Sparse Activations and Asymmetric Quantization for Edge Computing
AU - Lin, Wei Chen
AU - Chang, Ya Chu
AU - Huang, Juinn Dar
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/6
Y1 - 2021/6/6
N2 - Multilayer perceptron (MLP) is one of the most popular neural network architectures used for classification, regression, and recommendation systems today. In this paper, we propose an efficient and low-power MLP accelerator for edge computing. The accelerator has three key features. First, it aligns with a novel structured weight pruning algorithm that merely needs minimal hardware support. Second, it takes advantage of activation sparsity for power minimization. Third, it supports asymmetric quantization on both weights and activations to boost the model accuracy especially when those values are in low-precision formats. Furthermore, the number of PEs is determined based on the available external memory bandwidth to ensure the high PE utilization, which avoids area and energy wastes. Experiment results show that the proposed MLP accelerator with only 8 MACs operates at 1.6GHz using the TSMC 40nm technology, delivers 899GOPS equivalent computing power after structured weight pruning on a well-known image classification model, and achieves an equivalent energy efficiency of 9.7TOPS/W, while the model accuracy loss is less than 0.3% with the help of asymmetric quantization.
AB - Multilayer perceptron (MLP) is one of the most popular neural network architectures used for classification, regression, and recommendation systems today. In this paper, we propose an efficient and low-power MLP accelerator for edge computing. The accelerator has three key features. First, it aligns with a novel structured weight pruning algorithm that merely needs minimal hardware support. Second, it takes advantage of activation sparsity for power minimization. Third, it supports asymmetric quantization on both weights and activations to boost the model accuracy especially when those values are in low-precision formats. Furthermore, the number of PEs is determined based on the available external memory bandwidth to ensure the high PE utilization, which avoids area and energy wastes. Experiment results show that the proposed MLP accelerator with only 8 MACs operates at 1.6GHz using the TSMC 40nm technology, delivers 899GOPS equivalent computing power after structured weight pruning on a well-known image classification model, and achieves an equivalent energy efficiency of 9.7TOPS/W, while the model accuracy loss is less than 0.3% with the help of asymmetric quantization.
KW - algorithm-hardware co-design
KW - hardware accelerator
KW - model compression
KW - multilayer perceptron
UR - http://www.scopus.com/inward/record.url?scp=85113342039&partnerID=8YFLogxK
U2 - 10.1109/AICAS51828.2021.9458511
DO - 10.1109/AICAS51828.2021.9458511
M3 - Conference contribution
AN - SCOPUS:85113342039
T3 - 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021
BT - 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021
Y2 - 6 June 2021 through 9 June 2021
ER -