TY - GEN
T1 - Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm
AU - Wang, Syu Siang
AU - Hwang, Hsin Te
AU - Lai, Ying Hui
AU - Tsao, Yu
AU - Lu, Xugang
AU - Wang, Hsin Min
AU - Su, Borching
N1 - Publisher Copyright:
© 2015 Asia-Pacific Signal and Information Processing Association.
PY - 2016/2/19
Y1 - 2016/2/19
N2 - This paper investigates the use of the speech parameter generation (SPG) algorithm, which has been successfully adopted in deep neural network (DNN)-based voice conversion (VC) and speech synthesis (SS), for incorporating temporal information to improve the deep denoising auto-encoder (DDAE)-based speech enhancement. In our previous studies, we have confirmed that DDAE could effectively suppress noise components from noise corrupted speech. However, because DDAE converts speech in a frame by frame manner, the enhanced speech shows some level of discontinuity even though context features are used as input to the DDAE. To handle this issue, this study proposes using the SPG algorithm as a post-processor to transform the DDAE processed feature sequence to one with a smoothed trajectory. Two types of temporal information with SPG are investigated in this study: static-dynamic and context features. Experimental results show that the SPG with context features outperforms the SPG with static-dynamic features and the baseline system, which considers context features without SPG, in terms of standardized objective tests in different noise types and SNRs.
AB - This paper investigates the use of the speech parameter generation (SPG) algorithm, which has been successfully adopted in deep neural network (DNN)-based voice conversion (VC) and speech synthesis (SS), for incorporating temporal information to improve the deep denoising auto-encoder (DDAE)-based speech enhancement. In our previous studies, we have confirmed that DDAE could effectively suppress noise components from noise corrupted speech. However, because DDAE converts speech in a frame by frame manner, the enhanced speech shows some level of discontinuity even though context features are used as input to the DDAE. To handle this issue, this study proposes using the SPG algorithm as a post-processor to transform the DDAE processed feature sequence to one with a smoothed trajectory. Two types of temporal information with SPG are investigated in this study: static-dynamic and context features. Experimental results show that the SPG with context features outperforms the SPG with static-dynamic features and the baseline system, which considers context features without SPG, in terms of standardized objective tests in different noise types and SNRs.
UR - http://www.scopus.com/inward/record.url?scp=84986208177&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2015.7415295
DO - 10.1109/APSIPA.2015.7415295
M3 - Conference contribution
AN - SCOPUS:84986208177
T3 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
SP - 365
EP - 369
BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
Y2 - 16 December 2015 through 19 December 2015
ER -