Clustering tagged documents with labeled and unlabeled documents

Chien-Liang Liu*, Wen Hoar Hsaio, Chia Hoang Lee, Chun Hsien Chen

*此作品的通信作者

研究成果: Article同行評審

9 引文 斯高帕斯(Scopus)

摘要

This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.

原文English
頁(從 - 到)596-606
頁數11
期刊Information Processing and Management
49
發行號3
DOIs
出版狀態Published - 31 1月 2013

指紋

深入研究「Clustering tagged documents with labeled and unlabeled documents」主題。共同形成了獨特的指紋。

引用此