A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning

Wei Lu*, Han Hsiang Pei, Jheng Rong Yu, Hung Ming Chen, Po Tsang Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Training deep convolutional neural networks (DNNs) requires significantly more computational capacity, complex dataflow, memory accesses, and data movement among processing elements (PEs), as well as higher bit precision for back propagation (BP), which demands more power and area overhead than DNN inference. For mobile/edge devices, energy and area efficiency are critical concerns. This research proposes a row-based pipelined DNN training accelerator that employs three techniques to improve energy and area efficiency for resource-constrained edge/mobile devices. The first technique involves freezing weight updates in convolution and batch normalization layers. The second technique involves decomposing the simulated quantization for convolutional layers and reorganizing the operations of batch normalization layers. The mathematical demonstration shows that FP convolution operations can be completed using fixed point (FXP) calculations. FXP MACs with dequantizer can replace the original FP MACs for convolutional layers. Additionally, a row-based FXP/FP pipelined training accelerator is designed for layers pipeline, convolution, and batch normalization layers to increase the FXP and FP resource utilization. The third method uses multi-bank buffer management to prevent data conflicts and reduce the need for on-chip buffers by up to 3.5 times. The proposed accelerator was implemented using the TSMC 28nm CMOS process and achieved an energy efficiency of 2.19 TFLOPS/W and an area efficiency of 85.32 GFLOPS/mm2. It outperforms state-of-the-art works with 6.8 times the area efficiency and 3.7 times the energy efficiency.

Original languageEnglish
Title of host publicationISCAS 2024 - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350330991
DOIs
StatePublished - 2024
Event2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024 - Singapore, Singapore
Duration: 19 May 202422 May 2024

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310

Conference

Conference2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
Country/TerritorySingapore
CitySingapore
Period19/05/2422/05/24

Keywords

  • DNN Training accelerator
  • Frozen weights
  • Multi-bank buffer management
  • On-device transfer learning
  • Simulated quantization

Fingerprint

Dive into the research topics of 'A 28nm Energy-Area-Efficient Row-based pipelined Training Accelerator with Mixed FXP4/FP16 for On-Device Transfer Learning'. Together they form a unique fingerprint.

Cite this