Semi-supervised learning with committees: exploiting unlabeled data using ensemble learning algorithms

vts_7560_10802.pdf (6.205Mb)
313 Seiten
313 Seiten
Veröffentlichung
2011-02-21Authors
Abdel Hady, Mohamed Farouk
Dissertation
Faculties
Fakultät für Ingenieurwissenschaften und InformatikAbstract
Supervised machine learning is a branch of artificial intelligence concerned with learning computer programs to automatically improve with experience through knowledge extraction from examples. It builds predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but are particularly useful for tasks involving the automatic categorization, retrieval and extraction of knowledge from large collections of data such as text, images and videos. In traditional supervised learning, one uses "labeled" data to build a model. However, labeling the training data for real-world applications is difficult, expensive, or time consuming, as it requires the effort of human annotators sometimes with specific domain experience and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This is especially true for applications that involve learning with large number of class labels and sometimes with similarities among them.
Semi-supervised learning (SSL) addresses this inherent bottleneck by allowing the model to integrate part or all of the available unlabeled data in its supervised learning. The goal is to maximize the learning performance of the model through such newly-labeled examples while minimizing the work required of human annotators. Exploiting unlabeled data to help improve the learning performance has become a hot topic during the last decade. It is interesting to see that semi-supervised learning and ensemble learning are two important paradigms that were developed almost in parallel and with different philosophies. Semi-supervised learning tries to improve generalization performance by exploiting unlabeled data, while ensemble learning tries to achieve the same objective by using multiple predictors.
In this thesis, I concentrate on SSL with committees and especially on co-training style algorithms.
Date created
2011
Subject headings
[GND]: Data Mining | Dempster-Shafer-Theorie[LCSH]: Active learning
[MeSH]: Man-machine systems | Neoplasms; Classification
[Free subject headings]: Co-training | Dempster-Shafer evidence theory | Ensemble learning | Learning from unlabeled data | Multiple classifier systems | Semi-supervised learning
[DDC subject group]: DDC 004 / Data processing & computer science
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-1750
Abdel Hady, Mohamed Farouk (2011): Semi-supervised learning with committees: exploiting unlabeled data using ensemble learning algorithms. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-1750
Citation formatter >