Human action parsing in untrimmed videos and its applications for elderly people healthcare

Erstveröffentlichung
2020-10-01Authors
Zhang, Yan
Referee
Neumann, HeikoTang, Siyu
Dissertation
Faculties
Fakultät für Ingenieurwissenschaften, Informatik und PsychologieInstitutions
Institut für NeuroinformatikAbstract
Motivated by the demands and desires to improve the living quality of elderly people, in this thesis, we investigate visual human behavior analysis, and aim at proposing effective and reliable healthcare solutions for elderly people in various scenarios. Specifically, we focus on human behavior understanding from videos, captured by cameras in indoor environments, individual persons and actions at multiple granularity levels. When human behaviors can be understood automatically and reliably by computers, the living environment will become smart, provide effective assistance, and make the interaction between elderly people and smart environments as convenient as the interaction between youngsters and conventional environments.
Human behavior understanding from videos covers a broad range of tasks. To unify all the tasks, we propose the action parsing task: Given an untrimmed video with various types of actions, action parsing is to assign each individual frame an action label/cluster ID. Consequently, several classical tasks, e.g. action detection, video retrieval, action recognition and temporal action segmentation are unified within one framework. In this thesis, we first propose an unsupervised method, assigning each frame in the input video an cluster ID, and then a deep learning-based supervised method, assigning each frame an action label in an end-to-end manner. The unsupervised method is hierarchical dynamic clustering, which incorporates several novel modules and is inspired by the conventional bag-of-visual-words method for action recognition. For the deep learning-based supervised action parsing method, we employ a spatiotemporal convolutional encoder-decoder network, and propose novel bilinear pooling methods to realize fine-grained action parsing. We have applied the proposed methods to unsupervised action segmentation, abnormality (fainting) detection from omni-directional videos, explanation of how a deep neural network understands falls from videos, fine-grained human action parsing in daily living scenarios (recordings from both the third-part view and the egocentric view), and behavior understanding for surgical robots. All experimental results show that our proposed methods are effective and yield state-ofthe- art performances.
Date created
2020
Subject headings
[GND]: Computervision | Maschinelles Lernen[LCSH]: Human activity recognition | Computer vision | Machine learning
[Free subject headings]: Human action analysis
[DDC subject group]: DDC 610 / Medicine & health | DDC 620 / Engineering & allied operations
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-33211
Zhang, Yan (2020): Human action parsing in untrimmed videos and its applications for elderly people healthcare. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-33211
Citation formatter >