Deep Learning for Person Detection in Multi-spectral Videos
FakultätFakultät für Ingenieurwissenschaften, Informatik und Psychologie
InstitutionInstitut für Neuroinformatik
Ressourcen- / MedientypAbschlussarbeit (Master; Diplom), Text
Datum der Erstveröffentlichung2017-06-13
Person detection is a popular and still very active field of research in computer vision [ZBO+16] [BOHS14] [DWSP12]. There are many camera-based safety and security applications such as search and rescue [Myr13], surveillance [Shu14], driver assistance systems, or autonomous driving [Enz11]. Although person detection is intensively investigated, the state-of-the-art approaches do not achieve the performance of humans [ZBO+16]. Many of the existing approaches only consider Visual optical (VIS) RGB images. Infrared (IR) images are a promising source for further improvements [HPK+15] [WFHB16] [LZWM16]. Therefore, this thesis proposes an approach using multi-spectral input images based on the Faster R-CNN framework [RHGS16]. Different to existing approaches only the Region Proposal Network (RPN) of Faster R-CNN is utilized [ZLLH16]. The usage of two different training strategies [WFHB16] for training the RPN on VIS and IR images separately are evaluated. One approach starts using a pre-trained model for initialization, while the other training procedure additionally pre-finetunes the RPN with an auxiliary dataset. After training the RPN models separately for VIS and IR data, five different fusion approaches are analyzed that use the complementary information of the VIS and IR RPNs. The fusion approaches differ in the layers where fusion is applied. The Fusion RPN provides a performance gain of around 20% compared to the RPNs operating on only one of the two image spectra. An additional performance gain is achieved by applying a Boosted Decision Forest (BDF) on the deep features extracted from different convolutional layers of the RPN [ZLLH16]. This approach significantly reduces the number of False Positives (FPs) and thus boosts the detector performance by around 14% compared to the Fusion RPN. Furthermore, the conclusions of Zhang et al. [ZLLH16] are confirmed that an RPN alone can outperform the Faster R-CNN approach for the task of person detection. On the KAIST Multispectral Pedestrian Detection Benchmark [HPK+15] state-of-the-art results are achieved with a log-average Miss Rate (MR) of 29.83 %. Thus, compared to the recent benchmark results [LZWM16] a relative improvement by around 18% is obtained.
Pattern recognition systems