Unsupervised Prosody Labeling for Constructing Mandarin TTS

Chen Yu Chiang, Sin Horng Chen, Yih Ru Wang

研究成果: Paper同行評審

摘要

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constituents using both prosodic and linguistic features. The experimental results showed that the proposed unsupervised prosody labeling method could effectively label important prosodic cues so as to improve prosody prediction in a HMM-based text-to-speech system. Therefore, the proposed unsupervised prosody labeling method is promising and could be widely applied for labeling other large speech corpora.

原文English
頁面264-269
頁數6
出版狀態Published - 2010
事件7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010 - Kyoto, Japan
持續時間: 22 9月 201024 9月 2010

Conference

Conference7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010
國家/地區Japan
城市Kyoto
期間22/09/1024/09/10

指紋

深入研究「Unsupervised Prosody Labeling for Constructing Mandarin TTS」主題。共同形成了獨特的指紋。

引用此