摘要
Language modeling aims to extract linguistic regularities which are crucial in areas of information retrieval and speech recognition. Specifically, for Chinese systems, language dependent properties should be considered in Chinese language modeling. In this chapter, we first survey the works of word segmentation and new word extraction which are essential for the estimation of Chinese language models. Next, we present several recent approaches to deal with the issues of parameter smoothing and long-distance limitation in statistical n-gram language models. To tackle long-distance insufficiency, we address the association pattern language models. For the issue of model smoothing, we present a solution based on the latent semantic analysis framework. To effectively refine the language model, we also adopt the maximum entropy principle and integrate multiple knowledge sources from a collection of text corpus. Discriminative training is also discussed in this chapter. Some experiments on perplexity evaluation and Mandarin speech recognition are reported.
原文 | English |
---|---|
主出版物標題 | Advances in Chinese Spoken Language Processing |
發行者 | World Scientific Publishing Co. |
頁面 | 201-226 |
頁數 | 26 |
ISBN(電子) | 9789812772961 |
ISBN(列印) | 9812569049, 9789812569042 |
DOIs | |
出版狀態 | Published - 1 1月 2006 |