Compressing DNN Parameters for Model Loading Time Reduction

Yang Ming Yeh, Jennifer Shueh Inn Hu, Yen Yu Lin, Yi Chang Lu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Deep neural network (DNN) has been applied to a variety of computer vision tasks these days. However, DNN often suffers from its enormous execution time even with the aid of GPU. In this paper, we argue that the bandwidth bottleneck between GPU and GDRAM has to be addressed. To reduce loading time, we propose a DNN acceleration approach which compresses DNN parameters before loading model information to GPU and performs decompressing on GPU. Using JPEG compression as an example, the loss of the test accuracy can be kept within 4%, while an 8 × parameter-size reduction is achieved for VGG16.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Consumer Electronics - Asia, ICCE-Asia 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages78-79
Number of pages2
ISBN (Electronic)9781728133362
DOIs
StatePublished - Jun 2019
Event4th IEEE International Conference on Consumer Electronics - Asia, ICCE-Asia 2019 - Bangkok, Thailand
Duration: 12 Jun 201914 Jun 2019

Publication series

Name2019 IEEE International Conference on Consumer Electronics - Asia, ICCE-Asia 2019

Conference

Conference4th IEEE International Conference on Consumer Electronics - Asia, ICCE-Asia 2019
Country/TerritoryThailand
CityBangkok
Period12/06/1914/06/19

Keywords

  • DNN
  • architecture
  • compression

Fingerprint

Dive into the research topics of 'Compressing DNN Parameters for Model Loading Time Reduction'. Together they form a unique fingerprint.

Cite this