Learning from imbalanced data is an important and challenging topic in machine learning. Many works have devised methods to cope with imbalanced data, but most methods only consider minority or majority classes without considering the relationship between the two classes. In addition, many synthetic minority oversampling technique-based methods generate synthetic samples from the original feature space and use the Euclidean distance to search for the nearest neighbors. However, the Euclidean distance is not a precise distance metric in a high-dimensional space. This article proposes a novel method, called deep density hybrid sampling (DDHS), to address imbalanced data problems. The proposed method learns an embedding network to project the data samples into a low-dimensional separable latent space. The goal is to preserve class proximity during data projection, and we use within-class and between-class concepts to devise loss functions. We propose to use density as a criterion to select minority and majority samples. Subsequently, we apply a feature-level approach to the selected minority samples and generate diverse and valid synthetic samples for the minority class. This work conducts extensive experiments to assess our proposed method and compare it with several methods. The experimental results show that the proposed method can yield promising and stable results. The proposed method is a data-level algorithm, and we combine the proposed method with the boosting technique to develop a method called DDHS-boosting. We compare DDHS-boosting with several ensemble methods, and DDHS-boosting shows promising results.
|期刊||IEEE Transactions on Systems, Man, and Cybernetics: Systems|
|出版狀態||Accepted/In press - 3月 2022|