Latent Dirichlet mixture model

Jen-Tzung Chien*, Chao Hsi Lee, Zheng Hua Tan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

Original languageEnglish
Pages (from-to)12-22
Number of pages11
JournalNeurocomputing
Volume278
DOIs
StatePublished - 22 Feb 2018

Keywords

  • Bayesian learning
  • Dirichlet mixture model
  • Topic model

Fingerprint

Dive into the research topics of 'Latent Dirichlet mixture model'. Together they form a unique fingerprint.

Cite this