Compound data stream classification methods based on unsupervised and active learning

2013/09/B/ST6/02264 — funded by Polish National Science Center with budget 669 292 zł

Mw einstein project leader prof. dr hab. inż. Michał Woźniak

Zespół Uczenia Maszynowego

About project


The project relates to machine learning algorithms for data stream classification. The primary objective in the design of such a systems is to provide the highest efficiency, which can be understood as a kind of trade- off between accuracy and processing time. In order to achieve the goal a model of the system has to be adapted to the specificities. In the case of streaming data classification we should take into account that: (a) the characteristics of data may change over time, which is called Concept Drift, (b) the computational speed of the system must be high enough to allow efficient processing of large amounts of information in an acceptably short time.

In the project we plan to develop number of algorithms which ensuring high resistance of classification system to aforementioned concept drift. It is plan to investigate possibility of application of algorithms using a distributed and parallel programming paradigms in order to ensure high processing speed of streaming data.

Concluding, we define the following project objectives:

  1. Developing new methods of supervised and unsupervised concept drift detection along with respective classification algorithms dedicated for stream processing;
  2. Developing new classifier models along with respective adaptive learning algorithms aiming at permanent adjustment classifier parameters to changing characteristics of the data stream, especially ensemble of classifiers;
  3. Developing machine learning algorithms using a distributed and parallel programming paradigms.


To evaluate quality of the proposed methods we will carry out the systematic computer experiments.. We are going to use the analytical approach as well but we have to realize that their scope are usually very limited because of the assumptions and limitation which have to be made. It cause that its usefulness is limited as well. All experiments will be carried out using KNIME (and own software written in Matlab, R or Java). We are going to developed unified experimental framework which will be a base for caring on all the experiments and comprehensive analysis.

For the tasks connected with distributed machine learning algorithms own software will be designed and written using CUDA and MPI.


The subject of the project could be located in a rapidly developing trend of researches on machine learning methods associated with the analysis of streaming data. Currently it is hardly to find works on one-class classifiers for concept drift and methods for concept drift detection which do not require information on the labels of objects. The studies are intended to fill this gap.

The expected outcomes of the first three years of project are graduation of three PhD candidates, one habilitation and opening a new post-doc position. The results of research conducted in the project will be published in esteemed journals.

It is worth mentioning that during the realization of this project our team will actively take advantage of opportunities created by European project ENGINE - European research centre of Network intelliGence for INnovation Enhancement, FP7- REGPOT-2012-2013-1, 2013-16, no. 316097 (Coordination and support actions (Supporting Action) - Capacities Work Programme: Research Potential), which aims at increasing the research potential of selected teams from Wrocław University of Technology. This project is not of research nature itself – it only supports the research development.