TY - GEN
T1 - Unsupervised methods for Software Defect Prediction
AU - Ha, Duy An
AU - Chen, Ting Hsuan
AU - Yuan, Shyan-Ming
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/12/4
Y1 - 2019/12/4
N2 - Software Defect Prediction (SDP) aims to assess software quality by using machine learning techniques. Recently, by proposing the connectivity-based unsupervised learning method, Zhang et al. have been proven that unsupervised classification has great potential to apply to this problem. Inspiring by this idea, in our work we try to replicate the results of Zhang et al.'s experiment and attempt to improve the performance by examining different techniques at each step of the approach using unsupervised learning methods to solve the SDP problem. Specifically, we try to follow the steps of the experiment described in their work strictly and examine three other clustering methods with four other ways for feature selection besides using all. To the best of our knowledge, these methods are first applied in SDP to evaluate their predictive power. For replicating the results, generally results in our experiments are not as good as the previous work. It may be due to we do not know which features are used in their experiment exactly. Fluid clustering and spectral clustering give better results than Newman clustering and CNM clustering in our experiments. Additionally, the experiments also show that using Kernel Principal Component Analysis (KPCA) or Non-Negative Matrix Factorization (NMF) for feature selection step gives better performance than using all features in the case of unlabeled data. Lastly, to make replicating our work easy, a lightweight framework is created and released on Github.
AB - Software Defect Prediction (SDP) aims to assess software quality by using machine learning techniques. Recently, by proposing the connectivity-based unsupervised learning method, Zhang et al. have been proven that unsupervised classification has great potential to apply to this problem. Inspiring by this idea, in our work we try to replicate the results of Zhang et al.'s experiment and attempt to improve the performance by examining different techniques at each step of the approach using unsupervised learning methods to solve the SDP problem. Specifically, we try to follow the steps of the experiment described in their work strictly and examine three other clustering methods with four other ways for feature selection besides using all. To the best of our knowledge, these methods are first applied in SDP to evaluate their predictive power. For replicating the results, generally results in our experiments are not as good as the previous work. It may be due to we do not know which features are used in their experiment exactly. Fluid clustering and spectral clustering give better results than Newman clustering and CNM clustering in our experiments. Additionally, the experiments also show that using Kernel Principal Component Analysis (KPCA) or Non-Negative Matrix Factorization (NMF) for feature selection step gives better performance than using all features in the case of unlabeled data. Lastly, to make replicating our work easy, a lightweight framework is created and released on Github.
KW - Community structure detection
KW - Machine learning
KW - Software Defect Prediction
KW - Software engineering
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85077813526&partnerID=8YFLogxK
U2 - 10.1145/3368926.3369711
DO - 10.1145/3368926.3369711
M3 - Conference contribution
AN - SCOPUS:85077813526
T3 - ACM International Conference Proceeding Series
SP - 49
EP - 55
BT - Proceedings of the 10th International Symposium on Information and Communication Technology, SoICT 2019
PB - Association for Computing Machinery
T2 - 10th International Symposium on Information and Communication Technology, SoICT 2019
Y2 - 4 December 2019 through 6 December 2019
ER -