Bayesian latent topic clustering model

Meng Sung Wu*, Jen-Tzung Chien

*此作品的通信作者

    研究成果: Conference article同行評審

    1 引文 斯高帕斯(Scopus)

    摘要

    Document modeling is important for document retrieval and categorization. The probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) are popular paradigms of document models where word/document correlations are inferred by latent topics. In PLSA and LDA, the unseen words and documents are not explicitly represented at the same time. Model generalization is constrained. This paper presents the Bayesian latent topic clustering (BLTC) model for document representation. The posterior distributions combined by Dirichlet priors and multinomial distributions are not only calculated in document level but also in word level. The modeling of unseen words and documents is tackled. An efficient variational inference method based on Gibbs sampling is presented to calculate the posterior probability of complex variables. In the experiments on TREC and Reuters-21578, the proposed BLTC performs better than PLSA and LDA in model perplexity and classification accuracy.

    原文English
    頁(從 - 到)2162-2165
    頁數4
    期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    出版狀態Published - 1 十二月 2008
    事件INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
    持續時間: 22 九月 200826 九月 2008

    指紋

    深入研究「Bayesian latent topic clustering model」主題。共同形成了獨特的指紋。

    引用此