A 28nm 343.5fps/W Vision Transformer Accelerator with Integer-Only Quantized Attention Block

Cheng Chen Lin*, Wei Lu, Po Tsang Huang, Hung Ming Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision Transformer (ViT) has achieved state-of-the-art performance on various computer vision tasks. For the mobile/edge device, the energy efficiency is the most important issue. However, ViT requires huge computation and storage, which makes it difficult to be deployed on mobile/edge device. In this work, we focus on algorithm level and hardware level to improve efficiency of ViT inference. At algorithm level, we proposed energy efficient ViT model by adopting 4bit Quantization and Low-Rank Approximation to convert all the non-linear functions with floating point (FP) values in Multi-Head Attention (MHA) to linear function with integer (INT) values, to decrease the overhead caused by computation and storage. There are less accuracy drop compare with full-precision (<1.5%). At hardware level, we design an energy efficient row-based pipelined ViT accelerator for on-device inference. The proposed accelerator is consisted of integer-only quantizer, integer MACs PE array used for executing quantization and matrix operations, and approximated linear block adopted for executing low-rank approximation. As we know, in the research of ViT, this is the first accelerator uses 4-bits quantization and designs quantizer to operate integer-only quantization for on-device inference. This work can achieve energy efficiency of 343.5 fps/W and improve up to 8x energy efficiency compare to state-of-art works.

Original languageEnglish
Title of host publication2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages80-84
Number of pages5
ISBN (Electronic)9798350383638
DOIs
StatePublished - 2024
Event6th IEEE International Conference on AI Circuits and Systems, AICAS 2024 - Abu Dhabi, United Arab Emirates
Duration: 22 Apr 202425 Apr 2024

Publication series

Name2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings

Conference

Conference6th IEEE International Conference on AI Circuits and Systems, AICAS 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period22/04/2425/04/24

Keywords

  • Integer-Only Quantization
  • Low-Rank Approximation
  • On-Device Inference
  • Vision Transformer (ViT)

Fingerprint

Dive into the research topics of 'A 28nm 343.5fps/W Vision Transformer Accelerator with Integer-Only Quantized Attention Block'. Together they form a unique fingerprint.

Cite this