Hardware Accelerator for MobileViT Vision Transformer with Reconfigurable Computation

Shen Fu Hsiao*, Tzu Hsien Chao, Yen Che Yuan, Kun Chih Chen

*此作品的通信作者

研究成果: Conference contribution同行評審

3 引文 斯高帕斯(Scopus)

摘要

With the great success of the Transformer model in Natural Language Processing (NLP), Vision Transformer (ViT) was proposed achieving comparable performance to traditional Convolutional Neural Network (CNN) models in tasks such as image classification and object detection. This paper focuses on the acceleration of a new lightweight hybrid model, named MobileViT, which has less computation complexity and higher accuracy compared with ViT and other CNN-based lightweight models such as MobileNets. We introduce an adaptive systolic array (SA) design with a flexible shape size, called LEGO SA, that enhances the efficiency of hardware utilization and memory accesses during standard convolution, Depth-wise Separable Convolution (DWC), and self-attention operations. Furthermore, matrix transpose in self-attention is implemented efficiently with significantly reduced wastage of execution time, memory buffers, and power consumption. The proposed MobileViT hardware accelerator with 112KB on-chip buffers occupies an area of just 1.64mm^2 on the TSMC 40nm process, and achieves a performance of 1.2 TOPS at 600 MHz with energy efficiency of 5.34 TOPS/W.

原文English
主出版物標題ISCAS 2024 - IEEE International Symposium on Circuits and Systems
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798350330991
DOIs
出版狀態Published - 2024
事件2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024 - Singapore, 新加坡
持續時間: 19 5月 202422 5月 2024

出版系列

名字Proceedings - IEEE International Symposium on Circuits and Systems
ISSN(列印)0271-4310

Conference

Conference2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
國家/地區新加坡
城市Singapore
期間19/05/2422/05/24

指紋

深入研究「Hardware Accelerator for MobileViT Vision Transformer with Reconfigurable Computation」主題。共同形成了獨特的指紋。

引用此