Mining of association patterns for language modeling

Jen-Tzung Chien, Hung Ying Chen

研究成果同行評審

3 引文 斯高帕斯(Scopus)

摘要

Language modeling using n-gram is popular for speech recognition and many other applications. The conventional n-gram suffers from the insufficiencies of training data, domain knowledge and long distance language dependencies. This paper presents a new approach to mining long distance word associations and incorporating their mutual information into language models. We aim to discover the associations of multiple distant words from training corpus. An efficient algorithm is exploited to merge the frequent word subsets and construct the association patterns. The resulting association pattern n-gram is general with a special realization to trigger pair n-gram where only associations of two distant words are considered. To improve the modeling, we further compensate the weaknesses of sparse training data via parameter smoothing and domain mismatch via online adaptive learning. The proposed association pattern n-gram and several hybrid models are successfully applied for speech recognition. We also find that the incorporation of mutual information of association patterns can significantly reduce the perplexities of language models.

原文English
頁面1369-1372
頁數4
出版狀態Published - 10月 2004
事件8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, 韓國
持續時間: 4 10月 20048 10月 2004

Conference

Conference8th International Conference on Spoken Language Processing, ICSLP 2004
國家/地區韓國
城市Jeju, Jeju Island
期間4/10/048/10/04

指紋

深入研究「Mining of association patterns for language modeling」主題。共同形成了獨特的指紋。

引用此