TY - GEN
T1 - Efficient mining of a concise and lossless representation of high utility itemsets
AU - Wu, Cheng Wei
AU - Fournier-Viger, Philippe
AU - Yu, Philip S.
AU - Tseng, S.
PY - 2011
Y1 - 2011
N2 - Mining high utility itemsets from transactional databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). Although several studies have been carried out, current methods may present too many high utility itemsets for users, which degrades the performance of the mining task in terms of execution and memory efficiency. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed + high utility itemsets, which serves as a compact and lossless representation of high utility itemsets. We present an efficient algorithm called CHUD (Closed + High Utility itemset Discovery) for mining closed + high utility itemsets. Further, a method called DAHU (Derive All High Utility itemsets) is proposed to recover all high utility itemsets from the set of closed + high utility itemsets without accessing the original database. Results of experiments on real and synthetic datasets show that CHUD and DAHU are very efficient with a massive reduction (up to 800 times in our experiments) in the number of high utility itemsets. In addition, when all high utility itemsets are recovered by DAHU, the approach combining CHUD and DAHU also outperforms the state-of-the-art algorithms in mining high utility itemsets.
AB - Mining high utility itemsets from transactional databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). Although several studies have been carried out, current methods may present too many high utility itemsets for users, which degrades the performance of the mining task in terms of execution and memory efficiency. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed + high utility itemsets, which serves as a compact and lossless representation of high utility itemsets. We present an efficient algorithm called CHUD (Closed + High Utility itemset Discovery) for mining closed + high utility itemsets. Further, a method called DAHU (Derive All High Utility itemsets) is proposed to recover all high utility itemsets from the set of closed + high utility itemsets without accessing the original database. Results of experiments on real and synthetic datasets show that CHUD and DAHU are very efficient with a massive reduction (up to 800 times in our experiments) in the number of high utility itemsets. In addition, when all high utility itemsets are recovered by DAHU, the approach combining CHUD and DAHU also outperforms the state-of-the-art algorithms in mining high utility itemsets.
KW - Closed high utility itemset
KW - Frequent itemset
KW - Lossless and concise representation
KW - Utility mining
UR - http://www.scopus.com/inward/record.url?scp=84863129519&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2011.60
DO - 10.1109/ICDM.2011.60
M3 - Conference contribution
AN - SCOPUS:84863129519
SN - 9780769544083
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 824
EP - 833
BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
T2 - 11th IEEE International Conference on Data Mining, ICDM 2011
Y2 - 11 December 2011 through 14 December 2011
ER -