TY - GEN
T1 - Hardware-Friendly Activation Function Designs and Its Efficient VLSI Implementations for Transformer-Based Applications
AU - Huang, Yu Hsiang
AU - Kuo, Pei Hsuan
AU - Huang, Juinn Dar
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The activation function is one of key elements in modern machine learning algorithms. However, some broadly-used activation functions are exceptionally complex, e.g., GELU in Transformer-based algorithms, which makes their precise yet efficient VLSI implementations extremely hard. In this paper, two series of hardware-friendly activation function designs, DNR and PWL, and their VLSI implementations are proposed. Both are specifically designed to replace GELU, which is widely used in Transformer-related applications. Instead of utilizing traditional lookup-table (LUT)-based approximation methods, this paper introduces new activation functions that are not only hardware-friendly but successfully alleviate the dying neuron issue. Besides, each series includes a number of members, which can be freely selected through programming to best fit a given application. Experimental results indicate that the proposed new activation functions achieve comparable or even better model accuracy as compared to GELU. Moreover, the highly efficient and flexible VLSI implementations support 16 different Q-formats to maximize the output precision under various input scales. Compared with approximation-based implementation strategies, the proposed activation function designs and the corresponding LUT-free hardware implementations do achieve a significant improvement in speed, area, and power.
AB - The activation function is one of key elements in modern machine learning algorithms. However, some broadly-used activation functions are exceptionally complex, e.g., GELU in Transformer-based algorithms, which makes their precise yet efficient VLSI implementations extremely hard. In this paper, two series of hardware-friendly activation function designs, DNR and PWL, and their VLSI implementations are proposed. Both are specifically designed to replace GELU, which is widely used in Transformer-related applications. Instead of utilizing traditional lookup-table (LUT)-based approximation methods, this paper introduces new activation functions that are not only hardware-friendly but successfully alleviate the dying neuron issue. Besides, each series includes a number of members, which can be freely selected through programming to best fit a given application. Experimental results indicate that the proposed new activation functions achieve comparable or even better model accuracy as compared to GELU. Moreover, the highly efficient and flexible VLSI implementations support 16 different Q-formats to maximize the output precision under various input scales. Compared with approximation-based implementation strategies, the proposed activation function designs and the corresponding LUT-free hardware implementations do achieve a significant improvement in speed, area, and power.
KW - GELU
KW - dying neuron issue
KW - efficient VLSI implementation
KW - hardware-friendly activation function design
UR - http://www.scopus.com/inward/record.url?scp=85166375483&partnerID=8YFLogxK
U2 - 10.1109/AICAS57966.2023.10168591
DO - 10.1109/AICAS57966.2023.10168591
M3 - Conference contribution
AN - SCOPUS:85166375483
T3 - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
BT - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2023
Y2 - 11 June 2023 through 13 June 2023
ER -