TY - JOUR
T1 - ASC
T2 - Adaptive Scale Feature Map Compression for Deep Neural Network
AU - Yao, Yuan
AU - Chang, Tian Sheuan
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in 4× and up to 7.69× compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a 32× throughput increase meets the theoretical bandwidth of DDR5-6400 at just 7.65× the hardware cost.
AB - Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in 4× and up to 7.69× compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a 32× throughput increase meets the theoretical bandwidth of DDR5-6400 at just 7.65× the hardware cost.
KW - Compression
KW - deep learning
KW - feature maps
KW - hardware acceleration
UR - http://www.scopus.com/inward/record.url?scp=85180308596&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2023.3337283
DO - 10.1109/TCSI.2023.3337283
M3 - Article
AN - SCOPUS:85180308596
SN - 1549-8328
VL - 71
SP - 1417
EP - 1428
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 3
ER -