A Novel Number Representation and Its Hardware Support for Accurate Low-Bit Quantization on Large Recommender Systems

Yu Da Chu*, Pei Hsuan Kuo, Lyu Ming Ho, Juinn Dar Huang

*此作品的通信作者

研究成果: Conference contribution同行評審

摘要

Deep learning based recommender systems with large embedding tables have become pivotal for web content recommendation. However, the growing size of those tables, reaching tens of gigabytes or even terabytes, presents a tough challenge for conducting inferences on resource-constrained hardware. In this paper, we present a novel 6-bit fixed-point number representation format for more precise quantization on recommender models. The proposed format is specifically designed to accommodate the nonuniform weight distribution inside those huge embedding tables. To further alleviate the model size, the well-known K-means quantization technique is utilized for 4-bit quantization and beyond. Moreover, we also propose dedicated hardware decoder architectures for both 6-bit and 4-bit quantization to ensure efficient runtime inference. Experimental results show that the proposed low-bit (8~3-bit) quantization techniques on embedding tables yield 4~10.7x model size reduction with minor accuracy loss as compared to the original FP32 model. Therefore, the proposed number representation format and low-bit quantization techniques can effectively and drastically reduce the model size of large recommender systems with a very low area cost while still keeping the accuracy loss minimized.

原文English
主出版物標題2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面437-441
頁數5
ISBN(電子)9798350383638
DOIs
出版狀態Published - 2024
事件6th IEEE International Conference on AI Circuits and Systems, AICAS 2024 - Abu Dhabi, 阿拉伯聯合酋長國
持續時間: 22 4月 202425 4月 2024

出版系列

名字2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings

Conference

Conference6th IEEE International Conference on AI Circuits and Systems, AICAS 2024
國家/地區阿拉伯聯合酋長國
城市Abu Dhabi
期間22/04/2425/04/24

指紋

深入研究「A Novel Number Representation and Its Hardware Support for Accurate Low-Bit Quantization on Large Recommender Systems」主題。共同形成了獨特的指紋。

引用此