Software and hardware enhancement of convolutional neural networks on GPGPUs

An Ting Cheng*, Chun Yen Chen, Bo-Cheng Lai, Che Huai Lin

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Convolutional Neural Networks (CNNs) have gained attention in recent years for their ability to perform complex machine learning tasks with high accuracy and resilient to noise of inputs. The time-consuming convolution operations of CNNs pose great challenges to both software as well as hardware designers. To achieve superior performance, a design involves careful concerns between exposing the massive computation parallelism and exploiting data reuse in complex data accesses. Existing designs lack comprehensive analysis on design techniques and decisions. The analytical discussion and quantitative proof behind the design criterion, such as choosing proper dimensions to parallelize, are not well studied. This paper performs a series of qualitative and quantitative studies on both the programming techniques and their implications on the GPU architecture. The observations reveal comprehensive understanding on the correlation between the design techniques and the resulting performance. Based on the analyses, we pinpoint the two major performance bottlenecks of CNN on GPGPU: performing computation and loading data from global memory. Software and hardware enhancements are proposed in this paper to alleviate these issues. Experimental results on a cycle-accurate GPGPU simulator have demonstrated up to 4.4x performance enhancement when compared with the reference design.

Original languageEnglish
Pages (from-to)28-39
Number of pages12
JournalAdvances in Science, Technology and Engineering Systems
Issue number2
StatePublished - 1 Jan 2018


  • Convolutional neural network
  • Design and optimization


Dive into the research topics of 'Software and hardware enhancement of convolutional neural networks on GPGPUs'. Together they form a unique fingerprint.

Cite this