TY - JOUR
T1 - Dataflow and microarchitecture co-optimisation for sparse CNN on distributed processing element accelerator
AU - Pham, Duc-An
AU - Lai, Bo-Cheng
PY - 2020/11
Y1 - 2020/11
N2 - Accelerators that utilise the sparsity of both activation data and network structure of convolutional neural networks (CNNs) have demonstrated efficient processing of CNNs with superior performance. Previous research studies have shown three critical design concerns when designing accelerators for sparse CNNs, including data reuse, parallel computing performance, and effective sparse computation. These factors were each used in the previous accelerator designs, but none of the designs have considered all the factors at the same time. This study provides analytical approaches and experimental results to reveal the insight of accelerator design for sparse CNNs. The authors have shown that the architectural aspects need to be all considered to avoid performance pitfalls, including their mutual effects. Based on the proposed analytical approach, they proposed enhancement techniques and co-designed among the factors discussed in this study. The improved architecture shows up to 1.5x data reuse and/or 1.55x performance improvement in comparison with state-of-the-art sparse CNN accelerators while still maintaining equal area and energy cost.
AB - Accelerators that utilise the sparsity of both activation data and network structure of convolutional neural networks (CNNs) have demonstrated efficient processing of CNNs with superior performance. Previous research studies have shown three critical design concerns when designing accelerators for sparse CNNs, including data reuse, parallel computing performance, and effective sparse computation. These factors were each used in the previous accelerator designs, but none of the designs have considered all the factors at the same time. This study provides analytical approaches and experimental results to reveal the insight of accelerator design for sparse CNNs. The authors have shown that the architectural aspects need to be all considered to avoid performance pitfalls, including their mutual effects. Based on the proposed analytical approach, they proposed enhancement techniques and co-designed among the factors discussed in this study. The improved architecture shows up to 1.5x data reuse and/or 1.55x performance improvement in comparison with state-of-the-art sparse CNN accelerators while still maintaining equal area and energy cost.
KW - optimisation
KW - convolutional neural nets
KW - AI chips
KW - neural net architecture
KW - parallel architectures
KW - data reuse
KW - performance improvement
KW - CNN accelerators
KW - microarchitecture co-optimisation
KW - distributed processing element accelerator
KW - convolutional neural networks
KW - sparse CNN
KW - critical design concerns
KW - parallel computing performance
KW - sparse computation
U2 - 10.1049/iet-cds.2019.0225
DO - 10.1049/iet-cds.2019.0225
M3 - Article
SN - 1751-858X
VL - 14
SP - 1185
EP - 1194
JO - IET Circuits, Devices and Systems
JF - IET Circuits, Devices and Systems
IS - 8
ER -