A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning

Wei Lu*, Han Hsiang Pei, Jheng Rong Yu, Hung Ming Chen, Po Tsang Huang

*此作品的通信作者

研究成果: Conference contribution同行評審

摘要

Training deep convolutional neural networks (DNNs) requires significantly more computational capacity, complex dataflow, memory accesses, and data movement among processing elements (PEs), as well as higher bit precision for back propagation (BP), which demands more power and area overhead than DNN inference. For mobile/edge devices, energy and area efficiency are critical concerns. This research proposes a row-based pipelined DNN training accelerator that employs three techniques to improve energy and area efficiency for resource-constrained edge/mobile devices. The first technique involves freezing weight updates in convolution and batch normalization layers. The second technique involves decomposing the simulated quantization for convolutional layers and reorganizing the operations of batch normalization layers. The mathematical demonstration shows that FP convolution operations can be completed using fixed point (FXP) calculations. FXP MACs with dequantizer can replace the original FP MACs for convolutional layers. Additionally, a row-based FXP/FP pipelined training accelerator is designed for layers pipeline, convolution, and batch normalization layers to increase the FXP and FP resource utilization. The third method uses multi-bank buffer management to prevent data conflicts and reduce the need for on-chip buffers by up to 3.5 times. The proposed accelerator was implemented using the TSMC 28nm CMOS process and achieved an energy efficiency of 2.19 TFLOPS/W and an area efficiency of 85.32 GFLOPS/mm2. It outperforms state-of-the-art works with 6.8 times the area efficiency and 3.7 times the energy efficiency.

原文English
主出版物標題ISCAS 2024 - IEEE International Symposium on Circuits and Systems
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798350330991
DOIs
出版狀態Published - 2024
事件2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024 - Singapore, 新加坡
持續時間: 19 5月 202422 5月 2024

出版系列

名字Proceedings - IEEE International Symposium on Circuits and Systems
ISSN(列印)0271-4310

Conference

Conference2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
國家/地區新加坡
城市Singapore
期間19/05/2422/05/24

指紋

深入研究「A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning」主題。共同形成了獨特的指紋。

引用此