Learning in layered multimodal classifier architectures for cognitive technical systems

Erstveröffentlichung
2016-07-08Authors
Glodek, Michael
Referee
Palm, GüntherMartinetz, Thomas
Hammer, Barbara
Dissertation
Faculties
Fakultät für Ingenieurwissenschaften, Informatik und PsychologieInstitutions
Institut für NeuroinformatikAbstract
Modern computer systems have changed our way of living fundamentally. They improve
our effectiveness by assisting us in our work and daily tasks. However, current systems are limited to a direct input of commands. Furthermore, they are unable to take active decisions on the behalf of the user, mostly because of a lack of information about the user. Cognitive technical systems (CTS) pick up on these deficiencies by recognizing
user states and the user’s environment with the help of sensor data. The derived information is collected in a knowledge base and further processed by the application and the dialog management to perform the decision making.
In this thesis, new methods addressing sensor-based state recognition in the context of CTS in human-computer interaction are developed and empirically evaluated. The focus is set on large multimodal and temporal multiple classifier systems. Furthermore, the work covers the topics sequential classifiers, handling of partially-available information, and integration of sub-symbolic and symbolic information for complex state recognition. Following approaches are presented in this work: ensemble Gaussian mixture model
(EGMM), conditioned hidden Markov model (CHMM), fuzzy conditioned hidden Markov
model (FCHMM), hidden Markov model using graph probability densities (HMM-GPD),
Markov fusion network (MFN), Kalman filter for classifier fusion and layered classifier
architectures.
The EGMM extends the classical GMM by the ensemble technique in order to achieve a more robust density estimation. The CHMM and the FCHMM extend the HMM by an additional causal sequence which influences the hidden states. The HMM uses a sequence of discrete causes, whereas the FCHMM uses a sequence of causes with fuzzy memberships. Both approaches can further be utilized to the integrate symbolic information. The HMM-GPD introduces graph probability densities as observations in HMM. MFN and Kalman filter for classifier fusion are probabilistic algorithms for temporal and multimodal late fusion which are robust against sensor failures. Within this thesis, the unidirectional layered architecture (ULA) and the bidirectional layered architecture (BLA) are proposed. Both architectures recognize complex classes based on probabilistic logical rules and the temporal combination of basic patterns. Each layer recognizes patterns based on the class predictions of the underlying layer. Hence, upper layers recognize more complex patterns. The BLA additionally propagates information in the direction of the lower layers. The empirical evaluation of the proposed methods is performed on datasets for affective state and activity recognition, e.g. the Freetalk dataset, AVEC 2011, AVEC 2012, AVEC 2013 and UUlmMAD. The EGMM proved to be more robust and accurate when compared to the conventional GMM approaches. It was shown that the selection of suitable parameters is considerably easier. Further evaluations showed that the multimodal late fusion using
the CHMM outperformed the HMM on the Freetalk dataset. The HMM-GPD was studied
in the field of activity recognition and showed a good view-invariant performance. The classification was performed on sequences of graphs extracted from partially occluded
skeleton models. The MFN and Kalman filter for classifier fusion was studied on the AVEC datasets and achieved good results in comparision to other approaches. Furthermore, it was shown that they outperformed classic point-wise and windowed Fusion approaches. A comprehensive study analyzing the ULA showed that the FCHMM
is well-suited to recognize states on different layers given unsegmented sequential data. A dynamic Markov logic network implemented the probabilistic logical rules in the uppermost layer. The thesis further presents a new dataset which was recorded in order to study the BLA.
The development of a CTS brings new challenges to the recognition of user’s state
and his environment. The presented work identifies important properties in this area
and proposes and evaluates methods tailored to this operational area.
Date created
2016
Subject headings
[GND]: Markov-Modell | Datenfusion | Multisensor[LCSH]: Graphical modelling | Kalman filtering | Multiple criteria decision making | Multisensor data fusion
[Free subject headings]: Ensemble GMM | Probabilistic graphical model | Markov fusion network | Kalman filter for classifier fusion | Undirectional layered architecture | Bidirectional layered architecture | Inequality constraint multi-class F2-support vector machine | Graph probability density | Conditional hidden Markov model | Fuzzy conditional hidden Markov model
[DDC subject group]: DDC 000 / Computer science, information & general works
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-4030
Glodek, Michael (2016): Learning in layered multimodal classifier architectures for cognitive technical systems. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-4030
Citation formatter >
This could also interest you:
-
CoWolf - A Generic Framework for Multi-view Co-evolution and Evaluation of Models
Getir, Sinem et al. (2015)Beitrag zu einer Konferenz
-
Mitigation of RF Impairments of a 160-GHz MMIC FMCW Radar Using Model-Based Estimation
Haefner, Stephan et al. (2020)Wissenschaftlicher Artikel
-
Small animal bone healing models: Standards, tips, and pitfalls results of a consensus meeting
Histing, T. et al. (2011)Wissenschaftlicher Artikel