Comparing the performance of classic voice-driven assistive systems for dysarthric speech

Wei Zhong Zheng, Ji Yan Han, Hsiu Lien Cheng, Wei Chung Chu, Ko Chiang Chen, Ying Hui Lai*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Voice-driven communication assistive systems—speech enhancement (SE), voice conversion (VC), and automatic speech recognition with text-to-speech (ASR-TTS)—are recognized approaches for improving dysarthric speakers’ speech intelligibility. However, which approach performs better for moderate dysarthric patients is unclear. This study compared the benefits of three classic difference-type voice-driven assistive systems for dysarthric patients under identical test conditions. The benefits of the three systems for dysarthric patients’ speech intelligibility were compared; 14 mild-to-severe dysarthric patients and five speakers with normal speech were invited to record the training sets for these systems. Five moderate dysarthric patients were selected to record two additional testing sets, which were used for evaluating the systems’ benefits. Google Automatic Speech Recognition's (Google ASR) evaluation metrics and listening tests verified each system's speech intelligibility and quality. The speech intelligibility results produced by Google ASR were 7.0%, 22.9%, and 93.8% for the SE, VC, and ASR-TTS systems, respectively. Regarding the listening test, the performance of speech intelligibility and quality were 38.7%, 40.5%, 95.5%, and 1.81, 2.18, 4.56 for SE, VC, and ASR-TTS systems, respectively. The ASR-TTS system performed better than SE and VC. Furthermore, t-distributed stochastic neighbor embedding (t-SNE) analysis was used to additionally compare the differences between the systems. The t-SNE analysis results indicated that ASR-TTS’ phonetic posteriorgram features provided stable performance compared with the other speech features (log-power spectrum and spectra) in the SE and VC systems. Results showed that the ASR-TTS is a potential system to improve moderate dysarthric patients’ speech intelligibility and quality in future applications.

Original languageEnglish
Article number104447
JournalBiomedical Signal Processing and Control
Volume81
DOIs
StatePublished - Mar 2023

Keywords

  • Deep learning
  • Dysarthria
  • Speech intelligibility
  • Voice-driven assistive

Fingerprint

Dive into the research topics of 'Comparing the performance of classic voice-driven assistive systems for dysarthric speech'. Together they form a unique fingerprint.

Cite this