A first study on Mandarin prosodic state detection

Yuan-Fu Liao, Wern Jun Wang, Shu Ling Lee, Sin Horng Chen*

*此作品的通信作者

研究成果: Paper同行評審

摘要

In this paper, a method to detect prosodic phrase structure of Mandarin speech is proposed. It first employs an RNN to discriminate each input frame of an utterance among three broad classes of syllable initial, syllable final, and silence. Outputs of the RNN are then used to drive an FSM for segmenting the input utterance into four types of segment. They include three stable-segment - I (initial), F (final), and S (silence), and a transition-segment - T (transition). Appropriate modeling features are thus extracted from the vicinities of F-segments, and used to model the prosodic states for inter-F-segment intervals. Two prosodic-state modeling schemes are studied. One uses VQ to encode the modeling features and directly classify inter-F-segment intervals into 8 prosodic states. The other uses an RNN, trained with relevant linguistic features as output targets, to implicitly represent the prosodic status by the outputs of its hidden layer. Prosodic states can be obtained by vector-quantizing the outputs of the hidden layer of the RNN. Experimental results showed that linguistically meaningful interpretations of these prosodic states can be observed.

原文English
頁面399-411
頁數13
出版狀態Published - 1997

指紋

深入研究「A first study on Mandarin prosodic state detection」主題。共同形成了獨特的指紋。

引用此