TY - GEN
T1 - Dysarthric Speech Enhancement Based on Convolution Neural Network
AU - Wang, Syu Siang
AU - Tsao, Yu
AU - Zheng, Wei Zhong
AU - Yeh, Hsiu Wei
AU - Li, Pei Chun
AU - Fang, Shih Hau
AU - Lai, Ying Hui
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech.
AB - Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech.
UR - http://www.scopus.com/inward/record.url?scp=85138128914&partnerID=8YFLogxK
U2 - 10.1109/EMBC48229.2022.9871531
DO - 10.1109/EMBC48229.2022.9871531
M3 - Conference contribution
C2 - 36085875
AN - SCOPUS:85138128914
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 60
EP - 64
BT - 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2022
Y2 - 11 July 2022 through 15 July 2022
ER -