TY - JOUR
T1 - Enhancing intelligibility of dysarthric speech using gated convolutional-based voice conversion system
AU - Chen, Chen Yu
AU - Zheng, Wei Zhong
AU - Wang, Syu Siang
AU - Tsao, Yu
AU - Li, Pei Chun
AU - Lai, Ying Hui
N1 - Publisher Copyright:
Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - The voice conversion (VC) system is a well-known approach to improve the communication efficiency of patients with dysarthria. In this study, we used a gated convolutional neural network (Gated CNN) with the phonetic posteriorgrams (PPGs) features to perform VC for patients with dysarthria, with WaveRNN vocoder used to synthesis converted speech. In addition, two well-known deep learning-based models, convolution neural network (CNN) and bidirectional long short-term memory (BLSTM) were used to compare with the Gated CNN in the proposed VC system. The results from the evaluation of speech intelligibility metric of Google ASR and listening test showed that the proposed system performed better than the original dysarthric speech. Meanwhile, the Gated CNN model performs better than the other models and requires fewer parameters compared to BLSTM. The results suggested that Gated CNN can be used as a communication assistive system to overcome the degradation of speech intelligibility caused by dysarthria.
AB - The voice conversion (VC) system is a well-known approach to improve the communication efficiency of patients with dysarthria. In this study, we used a gated convolutional neural network (Gated CNN) with the phonetic posteriorgrams (PPGs) features to perform VC for patients with dysarthria, with WaveRNN vocoder used to synthesis converted speech. In addition, two well-known deep learning-based models, convolution neural network (CNN) and bidirectional long short-term memory (BLSTM) were used to compare with the Gated CNN in the proposed VC system. The results from the evaluation of speech intelligibility metric of Google ASR and listening test showed that the proposed system performed better than the original dysarthric speech. Meanwhile, the Gated CNN model performs better than the other models and requires fewer parameters compared to BLSTM. The results suggested that Gated CNN can be used as a communication assistive system to overcome the degradation of speech intelligibility caused by dysarthria.
KW - Deep learning
KW - Dysarthric speech
KW - Patients with dysarthria
KW - Speech intelligibility
KW - Voice conversion
UR - http://www.scopus.com/inward/record.url?scp=85097317626&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-1367
DO - 10.21437/Interspeech.2020-1367
M3 - Conference article
AN - SCOPUS:85097317626
SN - 2308-457X
VL - 2020-October
SP - 4686
EP - 4690
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -