Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Li Jen Yang*, Chao Han Huck Yang, Jen Tzung Chien*

*此作品的通信作者

研究成果: Conference article同行評審

4 引文 斯高帕斯(Scopus)

摘要

This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS). A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters to achieve competitive performance in voice synthesis. Motivated by a theoretical foundation of optimal transport (OT), this study carries out PEL for TTS where an auxiliary unsupervised loss based on OT is introduced to maximize a difference between the pre-trained source domain and the (unseen) target domain, in addition to its supervised training loss. Further, we leverage upon this unsupervised loss refinement to boost system performance via either sliced Wasserstein distance or maximum mean discrepancy. The merit of this work is demonstrated by fulfilling PEL solutions based on residual adapter learning, and model reprogramming when evaluating the Mandarin accent adaptation. Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning, and the auxiliary unsupervised loss improves model performance empirically.

原文English
頁(從 - 到)4354-4358
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2023-August
DOIs
出版狀態Published - 2023
事件24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
持續時間: 20 8月 202324 8月 2023

指紋

深入研究「Parameter-Efficient Learning for Text-to-Speech Accent Adaptation」主題。共同形成了獨特的指紋。

引用此