An End-to-end deep learning framework for acoustic emotion recognition

Thesis_Sawin.pdf (2.548Mb)
An End-to-end Deep Learning Framework for Acoustic Emotion Recognition
An End-to-end Deep Learning Framework for Acoustic Emotion Recognition
Erstveröffentlichung
2023-06-06Authors
Sawin, Michael
Advisor
Dresvyanskiy, DenisReferee
Minker, Wolfgang
Abschlussarbeit (Bachelor)
Faculties
Fakultät für Ingenieurwissenschaften, Informatik und PsychologieInstitutions
Institut für NachrichtentechnikAbstract
This work examines acoustic emotion recognition by a system using only an audio file
containing a person’s speech. This can be applied in several areas of human-computer-interaction (HCI). In this context, a system consisting of only a microphone and a processor is less expensive than other methods that use additional variants for emotion recognition. This thesis explores how automatic emotion recognition by a system using only
speech as input works. This work provides insight into different technologies that can be
applied for emotion recognition of a system.
Feature extraction and computation of Mel spectrograms from audio files was investigated and their usage as input to a machine learning algorithm or a neural network. A
framework was developed using a model, which is well suited to classify an emotion for a
short or longer audio clip.
The experiments of this thesis show the success that can be achieved using convolutional
neural networks and Mel spectrograms as its input to classify an emotion based only on an
audio file. This in combination with different training methods has results in a framework
that can classify emotions from short but also longer audio files in a very short time.
In summary, automatic classification of emotions from a system is a very interesting
topic, where multiple methods can lead to different successes. With the utilization of the
results of this work, a further step towards improved HCI can be taken, as systems can
adapt their action to the emotion state of the user.
Date created
2023
Subject headings
[GND]: Neuronales Netz | Maschinelles Lernen | Deep learning[LCSH]: Neural networks (Computer science) | Machine learning
[Free subject headings]: Acoustic Emotion Recognition | Speech Emotion Recognition | Neural Network | An End-to-end Deep Learning Framework for Acoustic Emotion Recognition | Framework for Acoustic Emotion Recognition | Deep Learning Framework for Acoustic Emotion Recognition
[DDC subject group]: DDC 620 / Engineering & allied operations
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-48933
Sawin, Michael (2023): An End-to-end deep learning framework for acoustic emotion recognition. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. http://dx.doi.org/10.18725/OPARU-48933
Citation formatter >
This could also interest you:
-
Spectral graph features for the classification of graphs and graph sequences
Schmidt, Miriam; Palm, Guenther; Schwenker, Friedhelm (2014)Wissenschaftlicher Artikel
-
Psychometric challenges and proposed solutions when scoring facial emotion expression codes
Olderbak, Sally et al. (2014)Wissenschaftlicher Artikel
-
Von der Fremdbeurteilung des Schmerzes zur automatisierten multimodalen Messung der Schmerzintensität. Narrativer Review zum Stand der Forschung und zur klinischen Perspektive
Frisch, S. et al. (2020)Wissenschaftlicher Artikel