TY - GEN
T1 - Online Chinese restaurant process
AU - Liu, Chien-Liang
AU - Tsai, Tsung Hsun
AU - Lee, Chia Hoang
PY - 2014
Y1 - 2014
N2 - Processing large volumes of streaming data in near-real-time is becoming increasingly important as the Internet, sensor networks and network traffic grow. Online machine learning is a typical means of dealing with streaming data, since it allows the classification model to learn one instance of data at a time. Although many online learning methods have been developed since the development of the Perceptron algorithm, existing online methods assume that the number of classes is available in advance of classification process. However, this assumption is unrealistic for large scale or streaming data sets. This work proposes an online Chinese restaurant process (CRP) algorithm, which is an online and nonparametric algorithm, to tackle this problem. This work proposes a relaxing function as part of the prior and updates the parameters with the likelihood function in terms of the consistency between the true label information and predicted result. This work presents two Gibbs sampling algorithms to perform posterior inference. In the experiments, the online CRP is applied to three massive data sets, and compared with several online learning and batch learning algorithms. One of the data sets is obtained from Wikipedia, which comprises approximately two million documents. The experimental results reveal that the proposed online CRP performs well and efficiently on massive data sets. Finally, this work proposes two methods to update the hyperparameter α of the online CRP. The first method is based on the posterior distribution of α, and the second exploits the property of online learning, namely adapting to change, to adjust α dynamically.
AB - Processing large volumes of streaming data in near-real-time is becoming increasingly important as the Internet, sensor networks and network traffic grow. Online machine learning is a typical means of dealing with streaming data, since it allows the classification model to learn one instance of data at a time. Although many online learning methods have been developed since the development of the Perceptron algorithm, existing online methods assume that the number of classes is available in advance of classification process. However, this assumption is unrealistic for large scale or streaming data sets. This work proposes an online Chinese restaurant process (CRP) algorithm, which is an online and nonparametric algorithm, to tackle this problem. This work proposes a relaxing function as part of the prior and updates the parameters with the likelihood function in terms of the consistency between the true label information and predicted result. This work presents two Gibbs sampling algorithms to perform posterior inference. In the experiments, the online CRP is applied to three massive data sets, and compared with several online learning and batch learning algorithms. One of the data sets is obtained from Wikipedia, which comprises approximately two million documents. The experimental results reveal that the proposed online CRP performs well and efficiently on massive data sets. Finally, this work proposes two methods to update the hyperparameter α of the online CRP. The first method is based on the posterior distribution of α, and the second exploits the property of online learning, namely adapting to change, to adjust α dynamically.
KW - adaptive learning
KW - chinese restaurant process
KW - nonparametric
KW - online learning
UR - http://www.scopus.com/inward/record.url?scp=84907030846&partnerID=8YFLogxK
U2 - 10.1145/2623330.2623636
DO - 10.1145/2623330.2623636
M3 - Conference contribution
AN - SCOPUS:84907030846
SN - 9781450329569
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 591
EP - 600
BT - KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Y2 - 24 August 2014 through 27 August 2014
ER -