TY - JOUR
T1 - A 2.17-mw acoustic dsp processor with cnn-fft accelerators for intelligent hearing assistive devices
AU - Lee, Yu Chi
AU - Chi, Tai-Shih
AU - Yang, Chia Hsiang
N1 - Publisher Copyright:
© 1966-2012 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/8
Y1 - 2020/8
N2 - This article presents an acoustic DSP processor containing a neural network core for intelligent hearing assistive devices. The processor includes the accelerators for convolutional neural networks (CNNs) and fast Fourier transform (FFT). The CNN-based speech enhancement algorithm predicts the desired mask for the Fourier spectrogram of the speech signal to enhance speech intelligibility. Several design techniques are applied to enable efficient hardware mapping. The computational complexity for the CNN can be reduced by 23.6% by frame sharing, and a fast mask generation + partial sums pre-computation technique further reduces output latency by up to 64%. The size of the memory for the model is reduced by 75% using weight quantization. FFT is implemented by leveraging the packing algorithm to reduce the computational complexity by 43%. Reconfigurable processing elements are shared to support both FFT and CNN, realizing a saving in the area of 42%. In addition, input sharing and output sharing are used to, respectively, reduce data movements by 94% and 75%. A reordered FFT structure also eliminates up to 256 multiplexers. Fabricated in a 40-nm CMOS technology, the chip's core area is 4.2 mm2 and the power dissipation is 2.17 mW at a clock frequency of 5 MHz from a 0.6-V supply. The embedded CNN accelerator supports both convolutional and fully connected (FC) layers and achieves a comparable energy efficiency with state-of-The-Art CNN accelerators, despite the flexibility for FFT. The speech intelligibility is enhanced by up to 41% in the low SNR regime.
AB - This article presents an acoustic DSP processor containing a neural network core for intelligent hearing assistive devices. The processor includes the accelerators for convolutional neural networks (CNNs) and fast Fourier transform (FFT). The CNN-based speech enhancement algorithm predicts the desired mask for the Fourier spectrogram of the speech signal to enhance speech intelligibility. Several design techniques are applied to enable efficient hardware mapping. The computational complexity for the CNN can be reduced by 23.6% by frame sharing, and a fast mask generation + partial sums pre-computation technique further reduces output latency by up to 64%. The size of the memory for the model is reduced by 75% using weight quantization. FFT is implemented by leveraging the packing algorithm to reduce the computational complexity by 43%. Reconfigurable processing elements are shared to support both FFT and CNN, realizing a saving in the area of 42%. In addition, input sharing and output sharing are used to, respectively, reduce data movements by 94% and 75%. A reordered FFT structure also eliminates up to 256 multiplexers. Fabricated in a 40-nm CMOS technology, the chip's core area is 4.2 mm2 and the power dissipation is 2.17 mW at a clock frequency of 5 MHz from a 0.6-V supply. The embedded CNN accelerator supports both convolutional and fully connected (FC) layers and achieves a comparable energy efficiency with state-of-The-Art CNN accelerators, despite the flexibility for FFT. The speech intelligibility is enhanced by up to 41% in the low SNR regime.
KW - CMOS integrated circuits
KW - convolutional neural network (CNN)
KW - fast Fourier transform (FFT)
KW - reconfigurable architecture
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85090221404&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2020.2987695
DO - 10.1109/JSSC.2020.2987695
M3 - Article
SN - 0018-9200
VL - 55
SP - 2247
EP - 2258
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 8
M1 - 9082141
ER -