TY - JOUR
T1 - Clustering data with partial background information
AU - Liu, Chien-Liang
AU - Hsaio, Wen Hoar
AU - Chang, Tao Hsing
AU - Li, Hsuan Hsun
N1 - Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/5/1
Y1 - 2019/5/1
N2 - Clustering with partial supervision background information or semi-supervised clustering, learning from a combination of both labeled and unlabeled data, has received a lot of attention over the last decade. The supervisory information is usually used as the constraints to bias clustering towards a good region of search space. This paper proposes a semi-supervised algorithm, called constrained non-negative matrix factorization (Constrained-NMF), with a few labeled examples as constraints to improve performance. The proposed algorithm is a matrix factorization algorithm, in which initialization of matrices is required at the beginning. Although the benefits of good initialization are well-known, randomized seeding of basis matrix and coefficient matrix is still the standard approach for many non-negative matrix factorization (NMF) algorithms. This work devises an algorithm called entropy-based weighted semi-supervised fuzzy c-means (EWSS-FCM) algorithm to initialize the seeds. The experimental results indicate that the proposed Constrained-NMF can benefit from the initialization obtained from EWSS-FCM, which emphasizes the role of labeled examples and automatically weights them during the course of clustering. This work considers labeled examples in the objective functions to devise the two algorithms, in which the labeled information is propagated to unlabeled examples iteratively. We further analyze the proposed Constrained-NMF and give convergence justifications. The experiments are conducted on five real data sets, and experimental results indicate that the proposed algorithm generally outperforms the other alternatives.
AB - Clustering with partial supervision background information or semi-supervised clustering, learning from a combination of both labeled and unlabeled data, has received a lot of attention over the last decade. The supervisory information is usually used as the constraints to bias clustering towards a good region of search space. This paper proposes a semi-supervised algorithm, called constrained non-negative matrix factorization (Constrained-NMF), with a few labeled examples as constraints to improve performance. The proposed algorithm is a matrix factorization algorithm, in which initialization of matrices is required at the beginning. Although the benefits of good initialization are well-known, randomized seeding of basis matrix and coefficient matrix is still the standard approach for many non-negative matrix factorization (NMF) algorithms. This work devises an algorithm called entropy-based weighted semi-supervised fuzzy c-means (EWSS-FCM) algorithm to initialize the seeds. The experimental results indicate that the proposed Constrained-NMF can benefit from the initialization obtained from EWSS-FCM, which emphasizes the role of labeled examples and automatically weights them during the course of clustering. This work considers labeled examples in the objective functions to devise the two algorithms, in which the labeled information is propagated to unlabeled examples iteratively. We further analyze the proposed Constrained-NMF and give convergence justifications. The experiments are conducted on five real data sets, and experimental results indicate that the proposed algorithm generally outperforms the other alternatives.
KW - Clustering
KW - Fuzzy clustering
KW - Non-negative matrix factorization (NMF)
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85058059621&partnerID=8YFLogxK
U2 - 10.1007/s13042-018-0790-0
DO - 10.1007/s13042-018-0790-0
M3 - Article
AN - SCOPUS:85058059621
SN - 1868-8071
VL - 10
SP - 1123
EP - 1138
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 5
ER -