Mining top-K high utility itemsets

Cheng Wei Wu*, Bai En Shie, Vincent Shin-Mu Tseng, Philip S. Yu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

147 Scopus citations

Abstract

Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min-util. Although several studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If min-util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min-util is set too high, no high utility itemset will be found. Setting appropriate minimum utility thresholds by trial and error is a tedious process for users. In this paper, we address this problem by proposing a new framework named top-k high utility itemset mining, where k is the desired number of high utility itemsets to be mined. An efficient algorithm named TKU (Top-K Utility itemsets mining) is proposed for mining such itemsets without setting min-util. Several features were designed in TKU to solve the new challenges raised in this problem, like the absence of anti-monotone property and the requirement of lossless results. Moreover, TKU incorporates several novel strategies for pruning the search space to achieve high efficiency. Results on real and synthetic datasets show that TKU has excellent performance and scalability.

Original languageEnglish
Title of host publicationKDD'12 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages78-86
Number of pages9
DOIs
StatePublished - 2012
Event18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 - Beijing, China
Duration: 12 Aug 201216 Aug 2012

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012
Country/TerritoryChina
CityBeijing
Period12/08/1216/08/12

Keywords

  • high utility itemset
  • top-k pattern mining
  • utility mining

Fingerprint

Dive into the research topics of 'Mining top-K high utility itemsets'. Together they form a unique fingerprint.

Cite this