摘要
This paper explores speaking rate variation in Mandarin read speech. In contrast to assuming that each utterance is generated in a constant or global speaking rate, this study seeks to estimate local speaking rate for each prosodic unit in an utterance. The exploration is based on the existing speaking rate-dependent hierarchical prosodic model (SR-HPM). The main idea is to first use the SR-HPM to explore the prosodic structures of utterances and extract the prosodic units. Then, local speaking rate is estimated for each prosodic unit (prosodic phrase in this study). Some major influence factors including tone, base syllable type, prosodic structure, and speaking rate of the higher prosodic units (utterance and BG/PG) are compensated in the local SR estimation. A syntactic-local SR model is constructed and use in the prosody generation of Mandarin TTS. Experimental results on a large read speech corpus generated by a professional female announcer showed that the generated prosody with local speaking rate variations is proved to be more vivid than the one with a constant speaking rate.
原文 | American English |
---|---|
頁(從 - 到) | 42-46 |
頁數 | 5 |
期刊 | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
卷 | 2018-September |
DOIs | |
出版狀態 | Published - 2018 |
事件 | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, 印度 持續時間: 2 9月 2018 → 6 9月 2018 |