Some advances in language modeling

Chuang Hua Chueh, Meng Sung Wu, Jen-Tzung Chien

研究成果: Chapter同行評審


Language modeling aims to extract linguistic regularities which are crucial in areas of information retrieval and speech recognition. Specifically, for Chinese systems, language dependent properties should be considered in Chinese language modeling. In this chapter, we first survey the works of word segmentation and new word extraction which are essential for the estimation of Chinese language models. Next, we present several recent approaches to deal with the issues of parameter smoothing and long-distance limitation in statistical n-gram language models. To tackle long-distance insufficiency, we address the association pattern language models. For the issue of model smoothing, we present a solution based on the latent semantic analysis framework. To effectively refine the language model, we also adopt the maximum entropy principle and integrate multiple knowledge sources from a collection of text corpus. Discriminative training is also discussed in this chapter. Some experiments on perplexity evaluation and Mandarin speech recognition are reported.

主出版物標題Advances in Chinese Spoken Language Processing
發行者World Scientific Publishing Co.
ISBN(列印)9812569049, 9789812569042
出版狀態Published - 1 1月 2006


深入研究「Some advances in language modeling」主題。共同形成了獨特的指紋。