TY - JOUR
T1 - Using dual response surface methodology as a benchmark to process multi-class imbalanced data
AU - Tong, Lee-Ing
AU - Chang, Kuei Hu
AU - Wu, Ping Yi
AU - Chang, Yung-Chia
N1 - Publisher Copyright:
© 2016 Chinese Institute of Industrial Engineers.
PY - 2017/2/17
Y1 - 2017/2/17
N2 - Constructing a classification model for the multi-class data is a critical problem in many areas. In practical applications, data in multiple classes are often imbalanced which might result in a classification model with high overall accuracy rate but with low accuracy rate for the minority class. However, minority class is usually the more important one compared to other classes in practice. This study integrates dual response surface methodology, logistic regression analysis, and desirability function to develop an optimal re-sampling strategy for classifying multi-class imbalanced data to effectively improve the low classification accuracy rate of the minority class(es) while still maintain a certain accuracy rate for the majority class(es). Three data-sets drawn from KEEL Database were used in the numerical experiments. The results showed that the proposed method can effectively improve the low classification accuracy rate of the minority class in contrast to the previous work.
AB - Constructing a classification model for the multi-class data is a critical problem in many areas. In practical applications, data in multiple classes are often imbalanced which might result in a classification model with high overall accuracy rate but with low accuracy rate for the minority class. However, minority class is usually the more important one compared to other classes in practice. This study integrates dual response surface methodology, logistic regression analysis, and desirability function to develop an optimal re-sampling strategy for classifying multi-class imbalanced data to effectively improve the low classification accuracy rate of the minority class(es) while still maintain a certain accuracy rate for the majority class(es). Three data-sets drawn from KEEL Database were used in the numerical experiments. The results showed that the proposed method can effectively improve the low classification accuracy rate of the minority class in contrast to the previous work.
KW - Multi-class imbalanced data
KW - design of experiments
KW - dual response surface methodology
KW - re-sampling strategy
UR - http://www.scopus.com/inward/record.url?scp=85007427931&partnerID=8YFLogxK
U2 - 10.1080/21681015.2016.1268216
DO - 10.1080/21681015.2016.1268216
M3 - Article
AN - SCOPUS:85007427931
SN - 2168-1015
VL - 34
SP - 147
EP - 158
JO - Journal of Industrial and Production Engineering
JF - Journal of Industrial and Production Engineering
IS - 2
ER -