On latent semantic language modeling and smoothing

Jen-Tzung Chien, Meng Sung Wu, Hua Jui Peng

Research output: Contribution to conferencePaperpeer-review

4 Scopus citations

Abstract

Language modeling plays a critical role for automatic speech recognition. Conventionally, the n-gram language models suffer from lacking good representation of historical words and estimating unseen parameters from insufficient training data. In this work, the latent semantic information is explored for language modeling and parameter smoothing. In language modeling, we present a new representation of historical words via retrieving the most likely relevance document. Besides, we also develop a novel parameter smoothing method where the language models of seen and unseen words are estimated by interpolating those of k nearest seen words in training corpus. The interpolation coefficients are determined according to the closeness of words in semantic space. In the experiments, the proposed modeling and smoothing methods can significantly reduce the perplexities of language models with moderate computation cost.

Original languageEnglish
Pages1373-1376
Number of pages4
StatePublished - Oct 2004
Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
Duration: 4 Oct 20048 Oct 2004

Conference

Conference8th International Conference on Spoken Language Processing, ICSLP 2004
Country/TerritoryKorea, Republic of
CityJeju, Jeju Island
Period4/10/048/10/04

Fingerprint

Dive into the research topics of 'On latent semantic language modeling and smoothing'. Together they form a unique fingerprint.

Cite this