TY - GEN
T1 - A Hardware-Friendly Alternative to Softmax Function and Its Efficient VLSI Implementation for Deep Learning Applications
AU - Hsieh, Meng Hsun
AU - Li, Xuan Hong
AU - Huang, Yu Hsiang
AU - Kuo, Pei Hsuan
AU - Huang, Juinn Dar
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The Softmax function holds an essential role in most machine learning algorithms. Conventional realization of Softmax necessitates computationally intensive exponential operations and divisions, thereby posing formidable challenges in developing low-cost hardware implementations. This paper presents a promising hardware-friendly alternative, Squaremax, which gets rid of complex exponential operations. The function definition is extremely simple and can thus be efficiently implemented in both software and hardware. Experimental results show that Squaremax consistently attains comparable or superior accuracy over several popular models. Besides, this paper also proposes an efficient hardware architecture design of Squaremax. It requires no functional units for exponential and logarithmic operations, and is even lookup table (LUT) free. It adopts a flexible 16-bit fixed-point Q format for I/O to better preserve the output precision, which leads to higher model accuracy. Moreover, it yields substantial improvements in speed, area, and power, as well as achieves remarkable area and power efficiency of 664 G/mm2 and 1396 G/W in a 40nm process. Therefore, hardware-friendly Squaremax is a very promising alternative to complex Softmax in both software and hardware for deep learning applications, and the proposed hardware architecture design and efficient LUT-free implementation do achieve a notable improvement in speed, area, and power.
AB - The Softmax function holds an essential role in most machine learning algorithms. Conventional realization of Softmax necessitates computationally intensive exponential operations and divisions, thereby posing formidable challenges in developing low-cost hardware implementations. This paper presents a promising hardware-friendly alternative, Squaremax, which gets rid of complex exponential operations. The function definition is extremely simple and can thus be efficiently implemented in both software and hardware. Experimental results show that Squaremax consistently attains comparable or superior accuracy over several popular models. Besides, this paper also proposes an efficient hardware architecture design of Squaremax. It requires no functional units for exponential and logarithmic operations, and is even lookup table (LUT) free. It adopts a flexible 16-bit fixed-point Q format for I/O to better preserve the output precision, which leads to higher model accuracy. Moreover, it yields substantial improvements in speed, area, and power, as well as achieves remarkable area and power efficiency of 664 G/mm2 and 1396 G/W in a 40nm process. Therefore, hardware-friendly Squaremax is a very promising alternative to complex Softmax in both software and hardware for deep learning applications, and the proposed hardware architecture design and efficient LUT-free implementation do achieve a notable improvement in speed, area, and power.
KW - efficient VLSI implementation
KW - hardware-friendly activation function design
KW - Softmax
UR - http://www.scopus.com/inward/record.url?scp=85198521952&partnerID=8YFLogxK
U2 - 10.1109/ISCAS58744.2024.10558086
DO - 10.1109/ISCAS58744.2024.10558086
M3 - Conference contribution
AN - SCOPUS:85198521952
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - ISCAS 2024 - IEEE International Symposium on Circuits and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
Y2 - 19 May 2024 through 22 May 2024
ER -