TY - GEN
T1 - VESTA
T2 - 20th IEEE Asia Pacific Conference on Circuits and Systems and IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics Electronics, APCCAS and PrimeAsia 2024
AU - Chen, Ching Yao
AU - Chen, Meng Chieh
AU - Chang, Tian Sheuan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. However, transformer architectures typically involve multiple types of computational layers, including linear layers for MLP modules and classification heads, convolution layers for tokenizers, and dot product computations for self-attention mechanisms. These diverse operations pose significant challenges for hardware accelerator design, and to our knowledge, there is not yet a hardware solution that leverages spike-form data from SNNs for transformer architectures. In this paper, we introduce VESTA, a novel hardware design that synergizes these technologies, presenting unified Processing Elements (PEs) capable of efficiently performing all three types of computations crucial to transformer structures. VESTA uniquely benefits from the spike-form outputs of the Spike Neuron Layers [1], simplifying multiplication operations by reducing them from handling two 8-bit integers to handling one 8-bit integer and a binary spike. This reduction enables the use of multiplexers in the PE module, significantly enhancing computational efficiency while maintaining the low-power advantage of SNNs. Experimental results show that the core area of VESTA is 0.844 mm2. It operates at 500 MHz and is capable of real-time image classification at 30 fps.
AB - Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. However, transformer architectures typically involve multiple types of computational layers, including linear layers for MLP modules and classification heads, convolution layers for tokenizers, and dot product computations for self-attention mechanisms. These diverse operations pose significant challenges for hardware accelerator design, and to our knowledge, there is not yet a hardware solution that leverages spike-form data from SNNs for transformer architectures. In this paper, we introduce VESTA, a novel hardware design that synergizes these technologies, presenting unified Processing Elements (PEs) capable of efficiently performing all three types of computations crucial to transformer structures. VESTA uniquely benefits from the spike-form outputs of the Spike Neuron Layers [1], simplifying multiplication operations by reducing them from handling two 8-bit integers to handling one 8-bit integer and a binary spike. This reduction enables the use of multiplexers in the PE module, significantly enhancing computational efficiency while maintaining the low-power advantage of SNNs. Experimental results show that the core area of VESTA is 0.844 mm2. It operates at 500 MHz and is capable of real-time image classification at 30 fps.
UR - http://www.scopus.com/inward/record.url?scp=85216109594&partnerID=8YFLogxK
U2 - 10.1109/APCCAS62602.2024.10808457
DO - 10.1109/APCCAS62602.2024.10808457
M3 - Conference contribution
AN - SCOPUS:85216109594
T3 - APCCAS and PrimeAsia 2024 - 2024 IEEE 20th Asia Pacific Conference on Circuits and Systems and IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics Electronics, Proceeding
SP - 6
EP - 10
BT - APCCAS and PrimeAsia 2024 - 2024 IEEE 20th Asia Pacific Conference on Circuits and Systems and IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics Electronics, Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 November 2024 through 9 November 2024
ER -