Improving the Efficiency of Dysarthria Voice Conversion System based on Data Augmentation

Wei Zhong Zheng, Ji Yan Han, Chen Yu Chen, Yuh Jer Chang, Ying Hui Lai

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Dysarthria, a speech disorder often caused by neurological damage, compromises the control of vocal muscles in patients, making their speech unclear and communication troublesome. Recently, voice-driven methods have been proposed to improve the speech intelligibility of patients with dysarthria. However, most methods require a significant representation of both the patient’s and target speaker’s corpus, which is problematic. This study aims to propose a data augmentation-based voice conversion (VC) system to reduce the recording burden on the speaker. We propose dysarthria voice conversion 3.1 (DVC 3.1) based on a data augmentation approach, including text-to-speech and StarGAN-VC architecture, to synthesize a large target and patient-like corpus to lower the burden of recording. An objective evaluation metric of the Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC 3.1 under free-talk conditions. The DVC system without data augmentation (DVC 3.0) was used for comparison. Subjective and objective evaluation based on the experimental results indicated that the proposed DVC 3.1 system enhanced the Google ASR of two dysarthria patients by approximately [62.4%, 43.3%] and [55.9%, 57.3%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. Further, the proposed DVC 3.1 increased the speech intelligibility of two dysarthria patients by approximately [54.2%, 22.3%] and [63.4%, 70.1%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. The proposed DVC 3.1 system offers significant potential to improve the speech intelligibility performance of patients with dysarthria and enhance verbal communication quality.

Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Transactions on Neural Systems and Rehabilitation Engineering
DOIs
StateAccepted/In press - 2023

Keywords

  • deep learning
  • dysarthric patient
  • phonetic posteriorgram
  • voice conversion

Fingerprint

Dive into the research topics of 'Improving the Efficiency of Dysarthria Voice Conversion System based on Data Augmentation'. Together they form a unique fingerprint.

Cite this