TY - GEN
T1 - Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning
AU - Kuo, Pei Hsuan
AU - Huang, Yu Hsiang
AU - Huang, Juinn Dar
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The increasing AI applications demands efficient computing capabilities to support a huge amount of calculations. Among the related arithmetic operations, multiplication is an indispensable part in most of deep learning applications. To support computing in different precisions demanded by various applications, it is essential for a multiplier architecture to meet the multi-precision demand while still achieving high utilization of the multiplication array and power efficiency. In this paper, a configurable multi-precision FP multiplier architecture with minimized redundant bits is presented. It can execute 16× FP8 operations, or 8× brain-floating-point (BF16) operations, or 4× half-precision (FP16) operations, or 1× single-precision (FP32) operation every cycle while maintaining a 100% multiplication hardware utilization ratio. Moreover, the computing results can also be represented in higher precision formats for succeeding high-precision computations. The proposed design has been implemented using the TSMC 40nm process with 1GHz clock frequency and consumes only 16.78mW on average. Compared to existing multi-precision FP multiplier architectures, the proposed design achieves the highest hardware utilization ratio with only 4.9K logic gates in the multiplication array. It also achieves high energy efficiencies of 1212.1, 509.6, 207.3, and 42.6 GFLOPS/W at FP8, BF16, FP16 and FP32 modes, respectively.
AB - The increasing AI applications demands efficient computing capabilities to support a huge amount of calculations. Among the related arithmetic operations, multiplication is an indispensable part in most of deep learning applications. To support computing in different precisions demanded by various applications, it is essential for a multiplier architecture to meet the multi-precision demand while still achieving high utilization of the multiplication array and power efficiency. In this paper, a configurable multi-precision FP multiplier architecture with minimized redundant bits is presented. It can execute 16× FP8 operations, or 8× brain-floating-point (BF16) operations, or 4× half-precision (FP16) operations, or 1× single-precision (FP32) operation every cycle while maintaining a 100% multiplication hardware utilization ratio. Moreover, the computing results can also be represented in higher precision formats for succeeding high-precision computations. The proposed design has been implemented using the TSMC 40nm process with 1GHz clock frequency and consumes only 16.78mW on average. Compared to existing multi-precision FP multiplier architectures, the proposed design achieves the highest hardware utilization ratio with only 4.9K logic gates in the multiplication array. It also achieves high energy efficiencies of 1212.1, 509.6, 207.3, and 42.6 GFLOPS/W at FP8, BF16, FP16 and FP32 modes, respectively.
KW - configurable
KW - deep learning
KW - floating-point multiplier design
KW - multi-precision computation
UR - http://www.scopus.com/inward/record.url?scp=85166368387&partnerID=8YFLogxK
U2 - 10.1109/AICAS57966.2023.10168572
DO - 10.1109/AICAS57966.2023.10168572
M3 - Conference contribution
AN - SCOPUS:85166368387
T3 - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
BT - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2023
Y2 - 11 June 2023 through 13 June 2023
ER -