Bayesian topic mixture model for information retrieval

Meng Sung Wu, Hsuan Jui Hsu, Jen-Tzung Chien

研究成果: Paper同行評審

摘要

In studies of automatic text processing, it is popular to apply the probabilistic topic model to infer word correlation through latent topic variables. Probabilistic latent semantic analysis (PLSA) is corresponding to such model that each word in a document is seen as a sample from a mixture model where mixture components are modeled by multinomial distribution. Although PLSA model deals with the issue of multiple topics, each topic model is quite simple and the word burstiness phenomenon is not taken into account. In this study, we present a new Bayesian topic mixture model (BTMM) to overcome the burstiness problem inherent in multinomial distribution. Accordingly, we use the Dirichlet distribution for representation of topic information beyond document level. Conceptually, the documents in the same class are generated by the associated multinomial distribution. In the experiments on TREC text corpus, we show the results of average precision and model perplexity to demonstrate the superiority of using proposed BTMM method.

原文English
出版狀態Published - 1 十二月 2007
事件19th Conference on Computational Linguistics and Speech Processing, ROCLING 2007 - Taipei, Taiwan
持續時間: 6 九月 20077 九月 2007

Conference

Conference19th Conference on Computational Linguistics and Speech Processing, ROCLING 2007
國家/地區Taiwan
城市Taipei
期間6/09/077/09/07

指紋

深入研究「Bayesian topic mixture model for information retrieval」主題。共同形成了獨特的指紋。

引用此