TY - CHAP
T1 - Some advances in language modeling
AU - Chueh, Chuang Hua
AU - Wu, Meng Sung
AU - Chien, Jen-Tzung
N1 - Publisher Copyright:
© 2007 by World Scientific Publishing Co. Pte. Ltd. All rights reserved.
PY - 2006/1/1
Y1 - 2006/1/1
N2 - Language modeling aims to extract linguistic regularities which are crucial in areas of information retrieval and speech recognition. Specifically, for Chinese systems, language dependent properties should be considered in Chinese language modeling. In this chapter, we first survey the works of word segmentation and new word extraction which are essential for the estimation of Chinese language models. Next, we present several recent approaches to deal with the issues of parameter smoothing and long-distance limitation in statistical n-gram language models. To tackle long-distance insufficiency, we address the association pattern language models. For the issue of model smoothing, we present a solution based on the latent semantic analysis framework. To effectively refine the language model, we also adopt the maximum entropy principle and integrate multiple knowledge sources from a collection of text corpus. Discriminative training is also discussed in this chapter. Some experiments on perplexity evaluation and Mandarin speech recognition are reported.
AB - Language modeling aims to extract linguistic regularities which are crucial in areas of information retrieval and speech recognition. Specifically, for Chinese systems, language dependent properties should be considered in Chinese language modeling. In this chapter, we first survey the works of word segmentation and new word extraction which are essential for the estimation of Chinese language models. Next, we present several recent approaches to deal with the issues of parameter smoothing and long-distance limitation in statistical n-gram language models. To tackle long-distance insufficiency, we address the association pattern language models. For the issue of model smoothing, we present a solution based on the latent semantic analysis framework. To effectively refine the language model, we also adopt the maximum entropy principle and integrate multiple knowledge sources from a collection of text corpus. Discriminative training is also discussed in this chapter. Some experiments on perplexity evaluation and Mandarin speech recognition are reported.
UR - http://www.scopus.com/inward/record.url?scp=84968912189&partnerID=8YFLogxK
U2 - 10.1142/9789812772961_0009
DO - 10.1142/9789812772961_0009
M3 - Chapter
AN - SCOPUS:84968912189
SN - 9812569049
SN - 9789812569042
SP - 201
EP - 226
BT - Advances in Chinese Spoken Language Processing
PB - World Scientific Publishing Co.
ER -