Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Li Jen Yang*, Chao Han Huck Yang, Jen Tzung Chien*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations

Abstract

This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS). A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters to achieve competitive performance in voice synthesis. Motivated by a theoretical foundation of optimal transport (OT), this study carries out PEL for TTS where an auxiliary unsupervised loss based on OT is introduced to maximize a difference between the pre-trained source domain and the (unseen) target domain, in addition to its supervised training loss. Further, we leverage upon this unsupervised loss refinement to boost system performance via either sliced Wasserstein distance or maximum mean discrepancy. The merit of this work is demonstrated by fulfilling PEL solutions based on residual adapter learning, and model reprogramming when evaluating the Mandarin accent adaptation. Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning, and the auxiliary unsupervised loss improves model performance empirically.

Original languageEnglish
Pages (from-to)4354-4358
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Parameter-efficient learning
  • accent adaptation
  • optimal transport
  • pre-trained model
  • test-to-speech

Fingerprint

Dive into the research topics of 'Parameter-Efficient Learning for Text-to-Speech Accent Adaptation'. Together they form a unique fingerprint.

Cite this