Bayesian latent topic clustering model

Meng Sung Wu*, Jen-Tzung Chien

*此作品的通信作者

研究成果: Conference article同行評審

1 引文 斯高帕斯(Scopus)

摘要

Document modeling is important for document retrieval and categorization. The probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) are popular paradigms of document models where word/document correlations are inferred by latent topics. In PLSA and LDA, the unseen words and documents are not explicitly represented at the same time. Model generalization is constrained. This paper presents the Bayesian latent topic clustering (BLTC) model for document representation. The posterior distributions combined by Dirichlet priors and multinomial distributions are not only calculated in document level but also in word level. The modeling of unseen words and documents is tackled. An efficient variational inference method based on Gibbs sampling is presented to calculate the posterior probability of complex variables. In the experiments on TREC and Reuters-21578, the proposed BLTC performs better than PLSA and LDA in model perplexity and classification accuracy.

原文English
頁(從 - 到)2162-2165
頁數4
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版狀態Published - 1 十二月 2008
事件INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
持續時間: 22 九月 200826 九月 2008

指紋

深入研究「Bayesian latent topic clustering model」主題。共同形成了獨特的指紋。

引用此