TY - GEN
T1 - Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition
AU - Shen, Yih Liang
AU - Huang, Chao Yuan
AU - Wang, Syu Siang
AU - Tsao, Yu
AU - Wang, Hsin Min
AU - Chi, Tai-Shih
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic speech recognition (ASR) system. If the target is to minimize the recognition error, the recognition results should be used to design the objective function for optimizing the SE model. However, the structure of an ASR system, which consists of multiple units, such as acoustic and language models, is usually complex and not differentiable. In this study, we propose to adopt the reinforcement learning (RL) algorithm to optimize the SE model based on the recognition results. We evaluated the proposed RL-based SE system on the Mandarin Chinese broadcast news corpus (MATBN). Experimental results demonstrate that the proposed SE system can effectively improve the ASR results with a notable 12:40% and 19:23% error rate reductions for signal to noise ratio (SNR) at 0 dB and 5 dB conditions, respectively.
AB - Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic speech recognition (ASR) system. If the target is to minimize the recognition error, the recognition results should be used to design the objective function for optimizing the SE model. However, the structure of an ASR system, which consists of multiple units, such as acoustic and language models, is usually complex and not differentiable. In this study, we propose to adopt the reinforcement learning (RL) algorithm to optimize the SE model based on the recognition results. We evaluated the proposed RL-based SE system on the Mandarin Chinese broadcast news corpus (MATBN). Experimental results demonstrate that the proposed SE system can effectively improve the ASR results with a notable 12:40% and 19:23% error rate reductions for signal to noise ratio (SNR) at 0 dB and 5 dB conditions, respectively.
KW - automatic speech recognition
KW - character error rate
KW - deep neural network
KW - reinforcement learning
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85068982399&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8683648
DO - 10.1109/ICASSP.2019.8683648
M3 - Conference contribution
AN - SCOPUS:85068982399
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6750
EP - 6754
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -