Personalized Taiwanese Speech Synthesis using Cascaded ASR and TTS Framework

Yuan Fu Liao, Wen Han Hsu, Chen Ming Pan, Wern Jun Wang, Matus Pleva, Daniel Hladek

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

To bring endangered Taiwanese language back to life, this paper leveraged a large-scale Taiwanese across Taiwan (TAT) corpus to construct cascaded automatic speech recognition (ASR) and text-to-speech (TTS)-based personalized Taiwanese speech synthesizers to help young people to learn how to speak Taiwanese. This paradigm not only alleviates the low resource, nonparallel corpus and cross-lingual training data problems but also dramatically reduces the fine-tuning data size and training time. Experimental results on a Taiwanese-to-Taiwanese and Mandarin-to-Taiwanese voice conversion tasks had shown that it allows us to successfully produce good personalized Taiwanese TTS with only approximately 3 minutes of data in both cases.

Original languageEnglish
Title of host publication2022 32nd International Conference Radioelektronika, RADIOELEKTRONIKA 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728186863
DOIs
StatePublished - 2022
Event32nd International Conference Radioelektronika, RADIOELEKTRONIKA 2022 - Kosice, Slovakia
Duration: 21 Apr 202222 Apr 2022

Publication series

Name2022 32nd International Conference Radioelektronika, RADIOELEKTRONIKA 2022 - Proceedings

Conference

Conference32nd International Conference Radioelektronika, RADIOELEKTRONIKA 2022
Country/TerritorySlovakia
CityKosice
Period21/04/2222/04/22

Keywords

  • speech recognition
  • Taiwanese speech corpus
  • Taiwanese speech synthesis
  • voice conversion

Fingerprint

Dive into the research topics of 'Personalized Taiwanese Speech Synthesis using Cascaded ASR and TTS Framework'. Together they form a unique fingerprint.

Cite this