Deep Learning for Person Detection in Multi-spectral Videos
Abschlussarbeit (Master; Diplom)
Autoren
König, Daniel
Gutachter
Neumann, HeikoTeutsch, Michael
Fakultäten
Fakultät für Ingenieurwissenschaften, Informatik und PsychologieInstitutionen
Institut für NeuroinformatikZusammenfassung
Person detection is a popular and still very active field of research
in computer vision [ZBO+16] [BOHS14] [DWSP12]. There are many
camera-based safety and security applications such as search and
rescue [Myr13], surveillance [Shu14], driver assistance systems, or
autonomous driving [Enz11]. Although person detection is intensively
investigated, the state-of-the-art approaches do not achieve the
performance of humans [ZBO+16]. Many of the existing approaches only
consider Visual optical (VIS) RGB images. Infrared (IR) images are a
promising source for further improvements [HPK+15] [WFHB16] [LZWM16].
Therefore, this thesis proposes an approach using multi-spectral input
images based on the Faster R-CNN framework [RHGS16]. Different to
existing approaches only the Region Proposal Network (RPN) of Faster
R-CNN is utilized [ZLLH16]. The usage of two different training
strategies [WFHB16] for training the RPN on VIS and IR images
separately are evaluated. One approach starts using a pre-trained
model for initialization, while the other training procedure
additionally pre-finetunes the RPN with an auxiliary dataset. After
training the RPN models separately for VIS and IR data, five different
fusion approaches are analyzed that use the complementary information
of the VIS and IR RPNs. The fusion approaches differ in the layers
where fusion is applied. The Fusion RPN provides a performance gain of
around 20% compared to the RPNs operating on only one of the two image
spectra. An additional performance gain is achieved by applying a
Boosted Decision Forest (BDF) on the deep features extracted from
different convolutional layers of the RPN [ZLLH16]. This approach
significantly reduces the number of False Positives (FPs) and thus
boosts the detector performance by around 14% compared to the Fusion
RPN. Furthermore, the conclusions of Zhang et al. [ZLLH16] are
confirmed that an RPN alone can outperform the Faster R-CNN approach
for the task of person detection. On the KAIST Multispectral
Pedestrian Detection Benchmark [HPK+15] state-of-the-art results are
achieved with a log-average Miss Rate (MR) of 29.83 %. Thus, compared
to the recent benchmark results [LZWM16] a relative improvement by
around 18% is obtained.
Erstellung / Fertigstellung
2017
Normierte Schlagwörter
Mustererkennung [GND]Neuronales Netz [GND]
Computer vision [LCSH]
Pattern recognition systems [LCSH]
Digital images [LCSH]
Schlagwörter
Multispectral; Faster R-CNN; Fusion; Deep learning; Person detectionDDC-Sachgruppe
DDC 004 / Data processing & computer scienceMetadata
Zur LanganzeigeZitiervorlage
König, Daniel (2017): Deep Learning for Person Detection in Multi-spectral Videos. Open Access Repositorium der Universität Ulm. http://dx.doi.org/10.18725/OPARU-4383