Show simple item record

AuthorSchneider, Markusdc.contributor.author
Date of accession2017-02-03T09:49:20Zdc.date.accessioned
Available in OPARU since2017-02-03T09:49:20Zdc.date.available
Year of creation2017dc.date.created
Date of first publication2017-02-03dc.date.issued
AbstractAnomalies are patterns in data or events which are unlikely to appear under normal conditions. It is of central interest to detect such anomalous instances to prevent damage or to extract valuable information from data. While statistics and machine learning developed several excellent key techniques to perform anomaly detection, most of them suffer poor algorithmic scalability when applied to large-scale datasets since the computational complexity and memory requirements become the limiting factor of these algorithms. This dissertation makes several contributions to the problem of large-scale anomaly detection centered on a novel method we introduce named EXPoSE which estimates the similarity between a new unseen observation and the distribution data under normal conditions. That way EXPoSE measures the likelihood for a new observation to be anomalous. Its core is based on the kernel embedding of distributions which maps a probability measure into a reproducing kernel Hilbert space where it can be manipulated efficiently. The kernel embedding representation requires no parametric assumptions or explicit description of the probability measure. This constitutes an important advantage since the distributions of normal and anomalous instances are in general unknown. The main contributions of this work are efficient algorithms to train and evaluate the EXPoSE anomaly detector. This can be achieved with computational complexities and memory requirements independent of the dataset size which is the key to solve large-scale machine learning problems. The dependence on the reproducing kernel function as a similarity measure enables the application to many domains and introduces the possibility to incorporate domain and expert knowledge into the modeling process. The key technologies are further developed to online and streaming anomaly detection where instances arrive in a possible infinite sequence of observations. A crucial requirement in these applications is the ability to make predictions as data arrive based on the information obtained from previous observations. One of the major challenges is the non-stationary nature of streams in which our understanding of what is normal and anomalous change over time. This introduces the necessity to adapt to such changes e.g. by forgetting outdated information while incorporating new knowledge. The simplicity of the proposed methodologies facilitates a theoretical analysis to provide guarantees in terms of convergence rates and probabilistic bounds.dc.description.abstract
Languageen_USdc.language.iso
PublisherUniversität Ulmdc.publisher
LicenseStandarddc.rights
Link to license texthttps://oparu.uni-ulm.de/xmlui/license_v3dc.rights.uri
Dewey Decimal GroupDDC 004 / Data processing & computer sciencedc.subject.ddc
LCSHArtificial intelligencedc.subject.lcsh
LCSHHilbert spacedc.subject.lcsh
LCSHMachine learningdc.subject.lcsh
LCSHAnomaly detectiondc.subject.lcsh
TitleExpected similarity estimation for large-scale anomaly detectiondc.title
Resource typeDissertationdc.type
Date of acceptance2016-12-21dcterms.dateAccepted
RefereePalm, Güntherdc.contributor.referee
RefereeErtel, Wolfgangdc.contributor.referee
RefereeSchwenker, Friedhelmdc.contributor.referee
DOIhttp://dx.doi.org/10.18725/OPARU-4222dc.identifier.doi
PPN880442190dc.identifier.ppn
URNhttp://nbn-resolving.de/urn:nbn:de:bsz:289-oparu-4261-2dc.identifier.urn
GNDAnomalieerkennungdc.subject.gnd
GNDKünstliche Intelligenzdc.subject.gnd
GNDMaschinelles Lernendc.subject.gnd
GNDMassendatendc.subject.gnd
GNDHilbert-Raumdc.subject.gnd
FacultyFakultät für Ingenieurwissenschaften, Informatik und Psychologieuulm.affiliationGeneral
InstitutionInstitut für Neuroinformatikuulm.affiliationSpecific
Shelfmark print versionW: W-H 14.984uulm.shelfmark
Grantor of degreeFakultät für Ingenieurwissenschaften, Informatik und Psychologieuulm.thesisGrantor
DCMI TypeTextuulm.typeDCMI
TypeErstveröffentlichunguulm.veroeffentlichung
CategoryPublikationenuulm.category
University Bibliographyjauulm.unibibliographie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record