Bayesian topic mixture model for information retrieval

Meng Sung Wu, Hsuan Jui Hsu, Jen-Tzung Chien

Research output: Contribution to conferencePaperpeer-review

Abstract

In studies of automatic text processing, it is popular to apply the probabilistic topic model to infer word correlation through latent topic variables. Probabilistic latent semantic analysis (PLSA) is corresponding to such model that each word in a document is seen as a sample from a mixture model where mixture components are modeled by multinomial distribution. Although PLSA model deals with the issue of multiple topics, each topic model is quite simple and the word burstiness phenomenon is not taken into account. In this study, we present a new Bayesian topic mixture model (BTMM) to overcome the burstiness problem inherent in multinomial distribution. Accordingly, we use the Dirichlet distribution for representation of topic information beyond document level. Conceptually, the documents in the same class are generated by the associated multinomial distribution. In the experiments on TREC text corpus, we show the results of average precision and model perplexity to demonstrate the superiority of using proposed BTMM method.

Original languageEnglish
StatePublished - 1 Dec 2007
Event19th Conference on Computational Linguistics and Speech Processing, ROCLING 2007 - Taipei, Taiwan
Duration: 6 Sep 20077 Sep 2007

Conference

Conference19th Conference on Computational Linguistics and Speech Processing, ROCLING 2007
Country/TerritoryTaiwan
CityTaipei
Period6/09/077/09/07

Keywords

  • Bayesian model
  • Dirichlet Prior
  • Graphical model
  • Information retrieval
  • PLSA

Fingerprint

Dive into the research topics of 'Bayesian topic mixture model for information retrieval'. Together they form a unique fingerprint.

Cite this