A Novel Number Representation and Its Hardware Support for Accurate Low-Bit Quantization on Large Recommender Systems

Yu Da Chu*, Pei Hsuan Kuo, Lyu Ming Ho, Juinn Dar Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning based recommender systems with large embedding tables have become pivotal for web content recommendation. However, the growing size of those tables, reaching tens of gigabytes or even terabytes, presents a tough challenge for conducting inferences on resource-constrained hardware. In this paper, we present a novel 6-bit fixed-point number representation format for more precise quantization on recommender models. The proposed format is specifically designed to accommodate the nonuniform weight distribution inside those huge embedding tables. To further alleviate the model size, the well-known K-means quantization technique is utilized for 4-bit quantization and beyond. Moreover, we also propose dedicated hardware decoder architectures for both 6-bit and 4-bit quantization to ensure efficient runtime inference. Experimental results show that the proposed low-bit (8~3-bit) quantization techniques on embedding tables yield 4~10.7x model size reduction with minor accuracy loss as compared to the original FP32 model. Therefore, the proposed number representation format and low-bit quantization techniques can effectively and drastically reduce the model size of large recommender systems with a very low area cost while still keeping the accuracy loss minimized.

Original languageEnglish
Title of host publication2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages437-441
Number of pages5
ISBN (Electronic)9798350383638
DOIs
StatePublished - 2024
Event6th IEEE International Conference on AI Circuits and Systems, AICAS 2024 - Abu Dhabi, United Arab Emirates
Duration: 22 Apr 202425 Apr 2024

Publication series

Name2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings

Conference

Conference6th IEEE International Conference on AI Circuits and Systems, AICAS 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period22/04/2425/04/24

Keywords

  • number representation
  • quantization
  • recommendation system

Fingerprint

Dive into the research topics of 'A Novel Number Representation and Its Hardware Support for Accurate Low-Bit Quantization on Large Recommender Systems'. Together they form a unique fingerprint.

Cite this