Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech

C. Y. Yeh*, Shaw-Hwa Hwang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


A new approach for an efficient text analyser is proposed. The prosody generator-driven method is employed to design an efficient text analyser for Mandarin text-to-speech. More simple structure of text analysis, more suitable classification of linguistic features and more efficient contribution of linguistic features to the prosody generator can be achieved. Three heuristic and theoretical methods are used to analyse and examine the capability of each linguistic feature. First, the contribution of each linguistic feature to the prosody generator is examined experimentally. Secondly, the cross-influence of each linguistic feature on the prosody generator is analysed. Thirdly, the problem of over- and under- classification of the linguistic features is inspected. Finally, these three analytic results are referenced to design an efficient text analyser. In total 35 243 Chinese characters are employed to examine the performance of our text analyser. Only 79 ms CPU time on a P4-1.4G PC is needed for word segmentation and POS tagging. Correction rates of 97.5 and 93.2% are achieved for the word segmentation and POS tagging, respectively. This confirms that the performance of our text analyser is very good. Moreover, a Mandarin text-to-speech system is implemented to inspect the performance of the text analysis and the contribution to the prosody generator. More natural and fluent speech is obtained under the lower computation. The MOS of prosody of the synthesised and original speech are 4.2 and 4.8, respectively, which is reasonably good.

Original languageEnglish
Pages (from-to)793-799
Number of pages7
JournalIEE Proceedings: Vision, Image and Signal Processing
Issue number6
StatePublished - 1 Dec 2005


Dive into the research topics of 'Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech'. Together they form a unique fingerprint.

Cite this