TY - GEN
T1 - Multi-level grid based clustering and GPU parallel implementations
AU - Qian, Quan
AU - Zhao, Shuai
AU - Xiao, Chao Jie
AU - Hung, Che Lun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/27
Y1 - 2017/11/27
N2 - Clustering algorithm for stream data, as one of stream data mining technologies, has extensive applications on network traffic analysis, telecommunication, planetary remote sensing, web site analysis, etc. Clustering algorithm for stream data has a high demand for real-time processing, but current clustering algorithms for stream data, such as Clustream, Dstream, are all based on sequential algorithms, which are unable to meet the realtime requirement. In this paper, we propose a multi-grid based clustering algorithm for stream data. The algorithm partitions the grid space appropriately on the basis of conventional grid-based DBSCAN clustering algorithm, which can effectively limit the searching scope of grid neighbours to accelerate processing performance. Meanwhile, we utilize CUDA to conduct parallel computing in order to further speed up processing. Through the experiments tested on the KDDCUP-99 open testing dataset, it shows that the processing speed of the algorithm proposed by the paper is 10 times faster than that of the conventional grid-based algorithm and moreover the CUDA based algorithm can achieve an speedup of 3 compared with the algorithm executed on CPU.
AB - Clustering algorithm for stream data, as one of stream data mining technologies, has extensive applications on network traffic analysis, telecommunication, planetary remote sensing, web site analysis, etc. Clustering algorithm for stream data has a high demand for real-time processing, but current clustering algorithms for stream data, such as Clustream, Dstream, are all based on sequential algorithms, which are unable to meet the realtime requirement. In this paper, we propose a multi-grid based clustering algorithm for stream data. The algorithm partitions the grid space appropriately on the basis of conventional grid-based DBSCAN clustering algorithm, which can effectively limit the searching scope of grid neighbours to accelerate processing performance. Meanwhile, we utilize CUDA to conduct parallel computing in order to further speed up processing. Through the experiments tested on the KDDCUP-99 open testing dataset, it shows that the processing speed of the algorithm proposed by the paper is 10 times faster than that of the conventional grid-based algorithm and moreover the CUDA based algorithm can achieve an speedup of 3 compared with the algorithm executed on CPU.
KW - Clustering algorithm for stream data
KW - GPU parallelization
KW - Multi-grid
UR - http://www.scopus.com/inward/record.url?scp=85048256615&partnerID=8YFLogxK
U2 - 10.1109/ISPAN-FCST-ISCC.2017.75
DO - 10.1109/ISPAN-FCST-ISCC.2017.75
M3 - Conference contribution
AN - SCOPUS:85048256615
T3 - Proceedings - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
SP - 397
EP - 402
BT - Proceedings - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
Y2 - 21 June 2017 through 23 June 2017
ER -