Hypothesis combination using genetic algorithm

Kuan Yu Lai*, Jhin Wun Wang, Chien-Liang Liu

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Improvement of machine learning performance is always an important issue in machine learning community. Ensemble learning has been shown to be able to normally improve model performance by combining various algorithms in the model. Most winners of large-scale data science competitions use ensemble learning technique to get high scores. Given available machine learning algorithms, ensemble learning involves two tasks, hypothesis selection and hypothesis combination. Hypothesis selection is about selecting the algorithms that are beneficial to model performance into the ensemble learning model, while hypothesis combination is related to determine the combination coefficients of the algorithms in the model. This work focuses on hypothesis combination problem, and we formulate it into an optimization problem. We propose to use genetic algorithm (GA) to tackle this optimization problem. The GA is a search heuristic for the purpose of solving optimization problem, which is a part of evolutionary computation. GA is inspired by the theory of natural evolution, which comprises several components, including selection, crossover and mutation. There are five phases involved in a typical GA algorithm: initial population, fitness function, selection, crossover, and mutation. GA begins with an initial population which can be regarded as a set of solutions. Next, a fitness function has to be defined to determine the fitness of each individual. Once the fitness score for each individual is computed, the selection and crossover are used to generate the fitting offspring. For a small amount of new offspring, some of their genes will mutate with a low probability. The whole process will repeat until the population converges. This work follows GA process to encode the problem, and defines appropriate fitness function according to the characteristics of the problem. We conduct experiments to evaluate the proposed approach in ensemble learning on a multi-class classification dataset, Red Wine Quality. We transform this problem into a binary classification problem, and use ten classification algorithms in the hypothesis pool, including Support Vector Machine, Random Forest, Decision Tree, Gradient Boosting, AdaBoost, Gaussian Naïve Bayes, Logistic Regression, Nu-Support Vector Machine, Stochastic Gradient Descent and Nearest Centroid. To determine the combination coefficients of each model, the continuous encoding of chromosomes is represented as binary chromosomes of approximation, and F1 is used as the fitness function. We use F1 score as the evaluation metric, and compare the proposed method with several alternatives. The experimental results indicate that the proposed method is comparative in determining the combination coefficients of algorithms in the ensemble learning.

Original languageEnglish
Pages (from-to)510-511
Number of pages2
JournalProceedings of the International Conference on Industrial Engineering and Operations Management
Volume2019
Issue numberMAR
StatePublished - Mar 2019
Event9th International Conference on Industrial Engineering and Operations Management, IEOM 2019 - Bangkok, Thailand
Duration: 5 Mar 20197 Mar 2019

Keywords

  • Combination coefficients
  • Ensemble learning
  • Genetic algorithm
  • Hypothesis combination

Fingerprint

Dive into the research topics of 'Hypothesis combination using genetic algorithm'. Together they form a unique fingerprint.

Cite this