Advisor | Dresvyanskiy, Denis | dc.contributor.advisor |
Author | Sawin, Michael | dc.contributor.author |
Date of accession | 2023-06-06T09:45:13Z | dc.date.accessioned |
Available in OPARU since | 2023-06-06T09:45:13Z | dc.date.available |
Year of creation | 2023 | dc.date.created |
Date of first publication | 2023-06-06 | dc.date.issued |
Abstract | This work examines acoustic emotion recognition by a system using only an audio file
containing a person’s speech. This can be applied in several areas of human-computer-interaction (HCI). In this context, a system consisting of only a microphone and a processor is less expensive than other methods that use additional variants for emotion recognition. This thesis explores how automatic emotion recognition by a system using only
speech as input works. This work provides insight into different technologies that can be
applied for emotion recognition of a system.
Feature extraction and computation of Mel spectrograms from audio files was investigated and their usage as input to a machine learning algorithm or a neural network. A
framework was developed using a model, which is well suited to classify an emotion for a
short or longer audio clip.
The experiments of this thesis show the success that can be achieved using convolutional
neural networks and Mel spectrograms as its input to classify an emotion based only on an
audio file. This in combination with different training methods has results in a framework
that can classify emotions from short but also longer audio files in a very short time.
In summary, automatic classification of emotions from a system is a very interesting
topic, where multiple methods can lead to different successes. With the utilization of the
results of this work, a further step towards improved HCI can be taken, as systems can
adapt their action to the emotion state of the user. | dc.description.abstract |
Language | en | dc.language.iso |
Publisher | Universität Ulm | dc.publisher |
License | CC BY 4.0 International | dc.rights |
Link to license text | https://creativecommons.org/licenses/by/4.0/ | dc.rights.uri |
Keyword | Acoustic Emotion Recognition | dc.subject |
Keyword | Speech Emotion Recognition | dc.subject |
Keyword | Neural Network | dc.subject |
Keyword | An End-to-end Deep Learning Framework for Acoustic Emotion Recognition | dc.subject |
Keyword | Framework for Acoustic Emotion Recognition | dc.subject |
Keyword | Deep Learning Framework for Acoustic Emotion Recognition | dc.subject |
Dewey Decimal Group | DDC 620 / Engineering & allied operations | dc.subject.ddc |
LCSH | Neural networks (Computer science) | dc.subject.lcsh |
LCSH | Machine learning | dc.subject.lcsh |
Title | An End-to-end deep learning framework for acoustic emotion recognition | dc.title |
Resource type | Abschlussarbeit (Bachelor) | dc.type |
Date of acceptance | 2023 | dcterms.dateAccepted |
Referee | Minker, Wolfgang | dc.contributor.referee |
DOI | http://dx.doi.org/10.18725/OPARU-48933 | dc.identifier.doi |
PPN | 1847534570 | dc.identifier.ppn |
URN | http://nbn-resolving.de/urn:nbn:de:bsz:289-oparu-49009-8 | dc.identifier.urn |
GND | Neuronales Netz | dc.subject.gnd |
GND | Maschinelles Lernen | dc.subject.gnd |
GND | Deep learning | dc.subject.gnd |
Faculty | Fakultät für Ingenieurwissenschaften, Informatik und Psychologie | uulm.affiliationGeneral |
Institution | Institut für Nachrichtentechnik | uulm.affiliationSpecific |
DCMI Type | Text | uulm.typeDCMI |
Category | Publikationen | uulm.category |
Bibliography | uulm | uulm.bibliographie |