Classification methods of imbalance data for multi-class classsification task

UMO-2015/19/B/ST6/01597 — funded by Polish National Science Center with budget 440 044 zł

Mw einstein project leader prof. dr hab. inż. Michał Woźniak

Zespół Uczenia Maszynowego

About project

This project covers the topic of designing efficient machine learning methods for the multi-class scenarios
suffering from uneven distribution of training samples in classes. Typically supervised learning methods are
designed to work with reasonably balanced data set, but many real world applications have to face imbalanced
data sets. A data set is said to be imbalanced when several classes are under-represented (minority classes) in
comparison with others (majority classes).

Learning from imbalanced data is among the contemporary challenges in machine learning and multi-class
imbalance stands out as the most difficult scenario. In binary imbalanced learning the relationships between
classes are easy to be defined: one class is the majority one, while the other is the minority one. However, in
multi-class scenarios this is no longer obvious, as the correlations between classes may vary and one class can
be at the same time minority and majority one with respect to different classes. Therefore canonical methods
designed for binary cases cannot be directly applied in such scenarios.
In this project we form a hypothesis that it is possible to design efficient multi-class methods for such
compound imbalance problems that could process all of classes at once. It aims at exploring three main
directions in multi-class imbalanced learning:

  1. how to analyse the structure of classes and identify the difficult examples,
  2. how to design imbalanced pre-processing methods (such as under and oversampling) specifically for multi-class problems,
  3. how to train efficient classifiers and ensemble learners with balanced performance on all of classes. We plan to identify general rules for designing efficient methods for learning from multi-class imbalanced data, proposing novel algorithms for this task and developing dedicated software packages that could be used in this area of research.

The presented literature survey allows us to conclude that there is a need to develop novel methodologies for
handling multi-class imbalanced problems and exploring the characteristics of examples within class
structures. This project aims at filling this area, by conducting a general investigation on how to analyse multiclass
imbalanced problems and design novel data preprocessing and classification algorithms dedicated to this


Krawczyk, Bartosz
McInnes, Bridget T
Local Ensemble Learning from Imbalanced and Noisy Data for Word Sense Disambiguation 
in Pattern Recognition, 2017
Koziarski, Michal
Krawczyk, Bartosz
Wozniak, Michal
The deterministic subspace method for constructing classifier ensembles 
in Pattern Analysis & Applications, 2017
Koziarski, Michal
Krawczyk, Bartosz
Wozniak, Michal
Radial-Based Approach to Imbalanced Data Oversampling 
Springer International Publishing 2017
Ksieniewicz, Pawel
Wozniak, Michal
Torgo, Lu'is
Krawczyk, Bartosz
Branco, Paula
Moniz, Nuno
Dealing with the task of imbalanced, multidimensional data classification using ensembles of exposers 
PMLR 2017
Zhang, Zhongliang
Krawczyk, Bartosz
Garc'ia, Salvador
Rosales-P'erez, Alejandro
Herrera, Francisco
Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data 
in Knowledge-Based Systems, 2016