Dataflow and microarchitecture co-optimisation for sparse CNN on distributed processing element accelerator

Duc-An Pham, Bo-Cheng Lai*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Accelerators that utilise the sparsity of both activation data and network structure of convolutional neural networks (CNNs) have demonstrated efficient processing of CNNs with superior performance. Previous research studies have shown three critical design concerns when designing accelerators for sparse CNNs, including data reuse, parallel computing performance, and effective sparse computation. These factors were each used in the previous accelerator designs, but none of the designs have considered all the factors at the same time. This study provides analytical approaches and experimental results to reveal the insight of accelerator design for sparse CNNs. The authors have shown that the architectural aspects need to be all considered to avoid performance pitfalls, including their mutual effects. Based on the proposed analytical approach, they proposed enhancement techniques and co-designed among the factors discussed in this study. The improved architecture shows up to 1.5x data reuse and/or 1.55x performance improvement in comparison with state-of-the-art sparse CNN accelerators while still maintaining equal area and energy cost.

Original languageEnglish
Pages (from-to)1185-1194
Number of pages10
JournalIET Circuits, Devices and Systems
Volume14
Issue number8
DOIs
StatePublished - Nov 2020

Keywords

  • optimisation
  • convolutional neural nets
  • AI chips
  • neural net architecture
  • parallel architectures
  • data reuse
  • performance improvement
  • CNN accelerators
  • microarchitecture co-optimisation
  • distributed processing element accelerator
  • convolutional neural networks
  • sparse CNN
  • critical design concerns
  • parallel computing performance
  • sparse computation

Fingerprint

Dive into the research topics of 'Dataflow and microarchitecture co-optimisation for sparse CNN on distributed processing element accelerator'. Together they form a unique fingerprint.

Cite this