TY - GEN
T1 - Mining high utility episodes in complex event sequences
AU - Wu, Cheng Wei
AU - Lin, Yu Feng
AU - Yu, Philip S.
AU - Tseng, Vincent Shin-Mu
N1 - Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/8/11
Y1 - 2013/8/11
N2 - Frequent episode mining (FEM) is an interesting research topic in data mining with wide range of applications. However, the traditional framework of FEM treats all events as having the same importance/utility and assumes that a same type of event appears at most once at any time point. These simplifying assumptions do not reflect the characteristics of scenarios in real applications and thus the useful information of episodes in terms of utilities such as profits is lost. Furthermore, most studies on FEM focused on mining episodes in simple event sequences and few considered the scenario of complex event sequences, where different events can occur simultaneously. To address these issues, in this paper, we incorporate the concept of utility into episode mining and address a new problem of mining high utility episodes from complex event sequences, which has not been explored so far. In the proposed framework, the importance/utility of different events is considered and multiple events can appear simultaneously. Several novel features are incorporated into the proposed framework to resolve the challenges raised by this new problem, such as the absence of antimonotone property and the huge set of candidate episodes. Moreover, an efficient algorithm named UP-Span (Utility ePisodes mining by Spanning prefixes) is proposed for mining high utility episodes with several strategies incorporated for pruning the search space to achieve high efficiency. Experimental results on real and synthetic datasets show that UP-Span has excellent performance and serves as an effective solution to the new problem of mining high utility episodes from complex event sequences.
AB - Frequent episode mining (FEM) is an interesting research topic in data mining with wide range of applications. However, the traditional framework of FEM treats all events as having the same importance/utility and assumes that a same type of event appears at most once at any time point. These simplifying assumptions do not reflect the characteristics of scenarios in real applications and thus the useful information of episodes in terms of utilities such as profits is lost. Furthermore, most studies on FEM focused on mining episodes in simple event sequences and few considered the scenario of complex event sequences, where different events can occur simultaneously. To address these issues, in this paper, we incorporate the concept of utility into episode mining and address a new problem of mining high utility episodes from complex event sequences, which has not been explored so far. In the proposed framework, the importance/utility of different events is considered and multiple events can appear simultaneously. Several novel features are incorporated into the proposed framework to resolve the challenges raised by this new problem, such as the absence of antimonotone property and the huge set of candidate episodes. Moreover, an efficient algorithm named UP-Span (Utility ePisodes mining by Spanning prefixes) is proposed for mining high utility episodes with several strategies incorporated for pruning the search space to achieve high efficiency. Experimental results on real and synthetic datasets show that UP-Span has excellent performance and serves as an effective solution to the new problem of mining high utility episodes from complex event sequences.
KW - Complex event sequences
KW - Episode mining
KW - High utility episodes
KW - Utility mining
UR - http://www.scopus.com/inward/record.url?scp=84977768501&partnerID=8YFLogxK
U2 - 10.1145/2487575.2487654
DO - 10.1145/2487575.2487654
M3 - Conference contribution
AN - SCOPUS:84977768501
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 536
EP - 544
BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A2 - Parekh, Rajesh
A2 - He, Jingrui
A2 - Inderjit, Dhillon S.
A2 - Bradley, Paul
A2 - Koren, Yehuda
A2 - Ghani, Rayid
A2 - Senator, Ted E.
A2 - Grossman, Robert L.
A2 - Uthurusamy, Ramasamy
PB - Association for Computing Machinery
T2 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
Y2 - 11 August 2013 through 14 August 2013
ER -