TY - GEN
T1 - Distributed Consensus Reduced Support Vector Machine
AU - Chen, Hsiang Hsuan
AU - Lee, Yuh Jye
N1 - Publisher Copyright:
© 2019 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/12
Y1 - 2019/12
N2 - Nowadays, machine learning performs astonishingly well in many different fields. In general, the more data we have, our machine learning methods will show better results. However, in many situations, the data owners may not want to or not allow to share their data because of legal issues or privacy concerns. However, if we can pool all the data together as the training data for the machine learning task we will have a better result. In the other situation, we encounter an extremely large dataset, which is difficult to store in a single machine. We may utilize more computing units to solve it. To deal with these two problems, we propose the distributed consensus reduced support vector machine (DCRSVM), which is a nonlinear model for binary classification. We apply the ADMM, Alternating Direction Method of Multipliers, to solve the DCRSVM. In each iteration, the local worker will update their model by incorporating the information shared by the master. The local workers only share their models in each iteration but never share their data. The master will fuse the local models reported by the local workers. At the end, the master will generate the consensus model that almost identical to the model generated by pooling all data together. Pooling all data together is not allowed in many real world applications.
AB - Nowadays, machine learning performs astonishingly well in many different fields. In general, the more data we have, our machine learning methods will show better results. However, in many situations, the data owners may not want to or not allow to share their data because of legal issues or privacy concerns. However, if we can pool all the data together as the training data for the machine learning task we will have a better result. In the other situation, we encounter an extremely large dataset, which is difficult to store in a single machine. We may utilize more computing units to solve it. To deal with these two problems, we propose the distributed consensus reduced support vector machine (DCRSVM), which is a nonlinear model for binary classification. We apply the ADMM, Alternating Direction Method of Multipliers, to solve the DCRSVM. In each iteration, the local worker will update their model by incorporating the information shared by the master. The local workers only share their models in each iteration but never share their data. The master will fuse the local models reported by the local workers. At the end, the master will generate the consensus model that almost identical to the model generated by pooling all data together. Pooling all data together is not allowed in many real world applications.
KW - Distributed Machine Learning
KW - Large-Scale Machine Learning.
KW - Privacy Preserving
UR - http://www.scopus.com/inward/record.url?scp=85081412112&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006098
DO - 10.1109/BigData47090.2019.9006098
M3 - Conference contribution
AN - SCOPUS:85081412112
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 5718
EP - 5727
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -