TY - JOUR
T1 - Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization
AU - Wang, Syu Siang
AU - Chern, Alan
AU - Tsao, Yu
AU - Hung, Jeih Weih
AU - Lu, Xugang
AU - Lai, Ying Hui
AU - Su, Borching
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8
Y1 - 2016/8
N2 - For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.
AB - For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.
KW - Discrete wavelet packet transform (DWPT)
KW - nonnegativematrix factorization (NMF)
KW - short-time Fourier transform (STFT)
KW - speech enhancement (SE)
UR - http://www.scopus.com/inward/record.url?scp=84979916738&partnerID=8YFLogxK
U2 - 10.1109/LSP.2016.2571727
DO - 10.1109/LSP.2016.2571727
M3 - Article
AN - SCOPUS:84979916738
SN - 1070-9908
VL - 23
SP - 1101
EP - 1105
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
IS - 8
M1 - 7476850
ER -