Show simple item record

AuthorKönig, Danieldc.contributor.author
Date of accession2017-06-13T10:36:54Zdc.date.accessioned
Available in OPARU since2017-06-13T10:36:54Zdc.date.available
Year of creation2017dc.date.created
Date of first publication2017-06-13dc.date.issued
AbstractPerson detection is a popular and still very active field of research in computer vision [ZBO+16] [BOHS14] [DWSP12]. There are many camera-based safety and security applications such as search and rescue [Myr13], surveillance [Shu14], driver assistance systems, or autonomous driving [Enz11]. Although person detection is intensively investigated, the state-of-the-art approaches do not achieve the performance of humans [ZBO+16]. Many of the existing approaches only consider Visual optical (VIS) RGB images. Infrared (IR) images are a promising source for further improvements [HPK+15] [WFHB16] [LZWM16]. Therefore, this thesis proposes an approach using multi-spectral input images based on the Faster R-CNN framework [RHGS16]. Different to existing approaches only the Region Proposal Network (RPN) of Faster R-CNN is utilized [ZLLH16]. The usage of two different training strategies [WFHB16] for training the RPN on VIS and IR images separately are evaluated. One approach starts using a pre-trained model for initialization, while the other training procedure additionally pre-finetunes the RPN with an auxiliary dataset. After training the RPN models separately for VIS and IR data, five different fusion approaches are analyzed that use the complementary information of the VIS and IR RPNs. The fusion approaches differ in the layers where fusion is applied. The Fusion RPN provides a performance gain of around 20% compared to the RPNs operating on only one of the two image spectra. An additional performance gain is achieved by applying a Boosted Decision Forest (BDF) on the deep features extracted from different convolutional layers of the RPN [ZLLH16]. This approach significantly reduces the number of False Positives (FPs) and thus boosts the detector performance by around 14% compared to the Fusion RPN. Furthermore, the conclusions of Zhang et al. [ZLLH16] are confirmed that an RPN alone can outperform the Faster R-CNN approach for the task of person detection. On the KAIST Multispectral Pedestrian Detection Benchmark [HPK+15] state-of-the-art results are achieved with a log-average Miss Rate (MR) of 29.83 %. Thus, compared to the recent benchmark results [LZWM16] a relative improvement by around 18% is obtained.dc.description.abstract
Languageen_USdc.language.iso
PublisherUniversität Ulmdc.publisher
LicenseStandarddc.rights
Link to license texthttps://oparu.uni-ulm.de/xmlui/license_v3dc.rights.uri
KeywordMultispectraldc.subject
KeywordFaster R-CNNdc.subject
KeywordFusiondc.subject
KeywordDeep learningdc.subject
KeywordPerson detectiondc.subject
Dewey Decimal GroupDDC 004 / Data processing & computer sciencedc.subject.ddc
LCSHComputer visiondc.subject.lcsh
LCSHPattern recognition systemsdc.subject.lcsh
LCSHDigital imagesdc.subject.lcsh
TitleDeep Learning for Person Detection in Multi-spectral Videosdc.title
Resource typeAbschlussarbeit (Master; Diplom)dc.type
Date of acceptance2017dcterms.dateAccepted
RefereeNeumann, Heikodc.contributor.referee
RefereeTeutsch, Michaeldc.contributor.referee
DOIhttp://dx.doi.org/10.18725/OPARU-4383dc.identifier.doi
PPN892865822dc.identifier.ppn
URNhttp://nbn-resolving.de/urn:nbn:de:bsz:289-oparu-4422-7dc.identifier.urn
GNDMustererkennungdc.subject.gnd
GNDNeuronales Netzdc.subject.gnd
FacultyFakultät für Ingenieurwissenschaften, Informatik und Psychologieuulm.affiliationGeneral
InstitutionInstitut für Neuroinformatikuulm.affiliationSpecific
DCMI TypeTextuulm.typeDCMI
TypeErstveröffentlichunguulm.veroeffentlichung
CategoryPublikationenuulm.category
University Bibliographyjauulm.unibibliographie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record