Abstract
Language modeling plays a critical role for automatic speech recognition. Conventionally, the n-gram language models suffer from lacking good representation of historical words and estimating unseen parameters from insufficient training data. In this work, the latent semantic information is explored for language modeling and parameter smoothing. In language modeling, we present a new representation of historical words via retrieving the most likely relevance document. Besides, we also develop a novel parameter smoothing method where the language models of seen and unseen words are estimated by interpolating those of k nearest seen words in training corpus. The interpolation coefficients are determined according to the closeness of words in semantic space. In the experiments, the proposed modeling and smoothing methods can significantly reduce the perplexities of language models with moderate computation cost.
Original language | English |
---|---|
Pages | 1373-1376 |
Number of pages | 4 |
State | Published - Oct 2004 |
Event | 8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of Duration: 4 Oct 2004 → 8 Oct 2004 |
Conference
Conference | 8th International Conference on Spoken Language Processing, ICSLP 2004 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju, Jeju Island |
Period | 4/10/04 → 8/10/04 |