Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search

Chun Cheng Lin, Jia Rong Kang*, Yu Lin Liang, Chih Chi Kuo


研究成果: Article同行評審

7 引文 斯高帕斯(Scopus)


In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods.

期刊Applied Soft Computing
出版狀態Published - 11月 2021


深入研究「Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search」主題。共同形成了獨特的指紋。