Learning Contrastive Emotional Nuances in Speech Synthesis

Bryan Gautama Ngo, Mahdin Rohmatillah, Jen Tzung Chien

研究成果: Conference contribution同行評審

摘要

Prosody is a crucial speech feature in emotional text - to-speech (TTS), as different emotions have distinct prosodic characteristics. Existing works in emotional TTS have primarily utilized emotion labels in the dataset by applying auxiliary emotion classification loss to enhance emotional nuances in the model. However, this approach may only partially leverage the potential of emotion labels. Accordingly, this paper proposes a supervised contrastive approach to effectively utilize emotion labels and enable the model to distinguish prosody from different emotions. Furthermore, this work also explores the unsupervised contrastive learning where the emotion labels are missing in emotional TTS. In particular, the proposed TTS architecture assures a cross-speaker emotion in transfer learning, allowing for an accurate speech generation even without specific prosody from a target speaker. The experimental results on emotional datasets demonstrate the effectiveness of the proposed method.

原文English
主出版物標題2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings
編輯Ming-Hsiang Su, Jui-Feng Yeh, Yuan-Fu Liao, Chi-Chun Lee, Yu Taso
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798331506032
DOIs
出版狀態Published - 2024
事件27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Hsinchu, 台灣
持續時間: 17 10月 202419 10月 2024

出版系列

名字2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings

Conference

Conference27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024
國家/地區台灣
城市Hsinchu
期間17/10/2419/10/24

指紋

深入研究「Learning Contrastive Emotional Nuances in Speech Synthesis」主題。共同形成了獨特的指紋。

引用此