Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search

Chun Cheng Lin, Jia Rong Kang*, Yu Lin Liang, Chih Chi Kuo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods.

Original languageEnglish
Article number107855
JournalApplied Soft Computing
Volume112
DOIs
StatePublished - Nov 2021

Keywords

  • Big data analysis
  • Feature selection
  • Instance selection
  • Metaheuristic
  • Noisy data

Fingerprint

Dive into the research topics of 'Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search'. Together they form a unique fingerprint.

Cite this