An SoC Integration Ready VLIW-Driven CNN Accelerator with High Utilization and Scalability

Chia Heng Hu, I. Hao Tseng, Pei Hsuan Kuo, Juinn Dar Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper a highly scalable VLIW-driven CNN accelerator architecture is proposed. A new VLIW instruction, which specifies all settings of an entire convolution layer and natively supports layer concatenation, is defined. A multi-mode input aligner (MMIA) is developed to efficiently organize input data for various convolution modes. A zero-initial-latency (ZIL) buffer is created to further boost the performance. A strip-based dataflow is adopted to drastically minimize external DRAM accesses. The accelerator is also equipped with an AXI4 on-chip bus interface, an instruction queue, ping-pong DRAM I/O buffers, and is thus ready for efficient and easy SoC integration. An accelerator instance with 576 MACs has been implemented using TSMC 40nm process. The core logic only requires 490K gates and the total internal memory size is merely 286KB. The peak performance is 1440 GOPS @1.25GHz and the core power efficiency is 8.71 TOPS/W. Moreover, the proposed accelerator has also enabled a real-time image semantic segmentation system for autonomous driving on an FPGA system.

Original languageEnglish
Title of host publicationProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages246-249
Number of pages4
ISBN (Electronic)9781665409964
DOIs
StatePublished - 2022
Event4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022 - Incheon, Korea, Republic of
Duration: 13 Jun 202215 Jun 2022

Publication series

NameProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022

Conference

Conference4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
Country/TerritoryKorea, Republic of
CityIncheon
Period13/06/2215/06/22

Keywords

  • convolutional neural network (CNN)
  • hardware accelerator
  • high performance
  • low power
  • SoC integration ready
  • very long instruction word (VLIW)

Fingerprint

Dive into the research topics of 'An SoC Integration Ready VLIW-Driven CNN Accelerator with High Utilization and Scalability'. Together they form a unique fingerprint.

Cite this