Enhancing utilization of simd-like accelerator for sparse convolutional neural networks

Bo-Cheng Lai*, Jyun Wei Pan, Chien Yu Lin

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Although the existing single-instruction-multiple-data-like (SIMD) accelerators can handle the compressed format of sparse convolutional neural networks, the sparse and irregular distributions of nonzero elements cause low utilization of multipliers in a processing engine (PE) and imbalanced computation between PEs. This brief addresses the above issues by proposing a data screening and task mapping (DSTM) accelerator which integrates a series of techniques, including software refinement and hardware modules. An efficient indexing module is introduced to identify the effectual computation pairs and skip unnecessary computation in a fine-grained manner. The intra-PE load imbalance is alleviated with weight data rearrangement. An effective task sharing mechanism further balances the computation between PEs. When compared with the state-of-the-art SIMD-like accelerator, the proposed DSTM enhances the average PE utilization by 3.5\times. The overall processing throughput is 59.7% higher than the previous design.

Original languageEnglish
Article number8644034
Pages (from-to)1218-1222
Number of pages5
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Issue number5
StatePublished - 1 May 2019


  • Load balance
  • Machine learning
  • Single-instruction-multiple-data (simd) architecture
  • Sparse convolutional neural networks (cnns)


Dive into the research topics of 'Enhancing utilization of simd-like accelerator for sparse convolutional neural networks'. Together they form a unique fingerprint.

Cite this