A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training

Cheng Hsun Lu, Yi Chung Wu, Chia Hsiang Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Rearrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2×10 5 times higher energy efficiency in training than a high-end CPU.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages65-68
Number of pages4
ISBN (Electronic)9781728151069
DOIs
StatePublished - Nov 2019
Event15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 - Macao, China
Duration: 4 Nov 20196 Nov 2019

Publication series

NameProceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
Volume2019-November

Conference

Conference15th IEEE Asian Solid-State Circuits Conference, A-SSCC 2019
Country/TerritoryChina
CityMacao
Period4/11/196/11/19

Keywords

  • CMOS digital integrated circuits
  • Convolutional neural network
  • Deep learning
  • Specialized processor

Fingerprint

Dive into the research topics of 'A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training'. Together they form a unique fingerprint.

Cite this