TY - GEN
T1 - Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks
AU - Yang, Yun Ru
AU - Lu, Wei
AU - Huang, Po Tsang
AU - Chen, Hung Ming
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.
AB - All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.
KW - adaptive floating point
KW - digital computation-in-memory
KW - folded architecture
KW - time interleaving
KW - time-division multiplexing
UR - http://www.scopus.com/inward/record.url?scp=85147442281&partnerID=8YFLogxK
U2 - 10.1109/MCSoC57363.2022.00042
DO - 10.1109/MCSoC57363.2022.00042
M3 - Conference contribution
AN - SCOPUS:85147442281
T3 - Proceedings - 2022 IEEE 15th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2022
SP - 216
EP - 223
BT - Proceedings - 2022 IEEE 15th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2022
Y2 - 19 December 2022 through 22 December 2022
ER -