Some advances in language modeling

Chuang Hua Chueh, Meng Sung Wu, Jen-Tzung Chien

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Language modeling aims to extract linguistic regularities which are crucial in areas of information retrieval and speech recognition. Specifically, for Chinese systems, language dependent properties should be considered in Chinese language modeling. In this chapter, we first survey the works of word segmentation and new word extraction which are essential for the estimation of Chinese language models. Next, we present several recent approaches to deal with the issues of parameter smoothing and long-distance limitation in statistical n-gram language models. To tackle long-distance insufficiency, we address the association pattern language models. For the issue of model smoothing, we present a solution based on the latent semantic analysis framework. To effectively refine the language model, we also adopt the maximum entropy principle and integrate multiple knowledge sources from a collection of text corpus. Discriminative training is also discussed in this chapter. Some experiments on perplexity evaluation and Mandarin speech recognition are reported.

Original languageEnglish
Title of host publicationAdvances in Chinese Spoken Language Processing
PublisherWorld Scientific Publishing Co.
Pages201-226
Number of pages26
ISBN (Electronic)9789812772961
ISBN (Print)9812569049, 9789812569042
DOIs
StatePublished - 1 Jan 2006

Fingerprint

Dive into the research topics of 'Some advances in language modeling'. Together they form a unique fingerprint.

Cite this