Combining Neural Networks with Knowledge for Spoken Dialogue Systems

Erstveröffentlichung
2023-01-26Authors
Ahmed, Waheed
Referee
Minker, WolfgangQi, Guilin
Dissertation
Faculties
Fakultät für Ingenieurwissenschaften, Informatik und PsychologieInstitutions
Institut für NachrichtentechnikExternal cooperations
School of Computer Science and Engineering, Southeast University, ChinaAbstract
Spoken dialogue systems are designed to communicate with users and help them using natural language. Task-oriented dialogue systems, for example, are designed to help users achieve specific tasks, such as booking a flight, navigating to a particular destination, or finding a restaurant. Typically, conventional dialogue systems are highly handcrafted, with complex logic and few rules. These systems are composed of four components: natural language understanding (NLU), dialogue state tracking (DST), dialogue manager (DM), and natural language generation (NLG). Despite advancements in natural language understanding and dialogue learning, these systems continue to confront significant problems in terms of robustness and scalability to new domains. The NLU module is built using domain-specific rules, making it difficult to expand to new domains. As a result, statistical models such as deep neural networks have been proposed for NLU tasks. However, these deep neural networks models rely on large amounts of labeled data, whereas we often have limited labeled data. In small training data, they fail to generalize over test data as they are prone to capture fake features rather than semantically significant domain features. Additionally, in many applications, obtaining high-quality labelled data is a highly costly and time-consuming operation.
In this thesis, we address the limitations of the existing NLU module of the spoken dialogue system by examining various methods for combining neural networks with symbolic rules knowledge to reduce the need for labeled training data, and for improving generalization and scalability to new domains. Contributions of this dissertation are following:
1) Our first contribution introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. We employ a pre-trained language model, namely Bidirectional Encoder Representations from Transformers (BERT) for argumentation domains having small in-domain training data. The proposed model fine-tuned BERT for two NLU tasks, namely: intent classification and argument similarity tasks. Intent classification identifies intent or user move in an argument and argument similarity task check the relation of user utterance with the presented arguments. The experimental results show a clear advantage of our proposed approach over the baselines for the intent classification and argument similarity tasks on different datasets. Moreover, the outcomes indicate a high and stable performance of the model for data from topics unseen during training and different language proficiency.
2) The second contribution augments the fine-tuning of BERT-like architecture with weighted finite-state transducer (WFST) to reduce the need for massive supervised data. The WFST-BERT model utilizes pre-trained BERT architecture to generate contextual representations of user sentences and leverage regular expressions (REs) rules by converting them into the trainable weighted finite-state transducer. BERT representation is then combined with WFST and trained simultaneously on supervised data using a gradient descent algorithm. The experimental results show that WFST-BERT can generate decent predictions when limited or no training examples are available.
3) The third contribution introduces a multi-task learning model based on neural networks with contextual information for multi-turn intent detection and slot filling tasks. We employ the memory network to model and optimize the multi-turn information in user conversation. The model extracts the contextual word features from the pre-trained language model and employs CNN and RNN structures for predicting user intent and tagging corresponding slots respectively. Furthermore, the model integrates regular expressions (REs) to encode domain knowledge and regulate neural network output in an end-to-end trainable manner.
We evaluated our proposed methods on publicly available single-turn datasets such as ATIS, SNIPS, Banking and multi-turn datasets namely Key-Value Retrieval and Frames. The empirical results demonstrate that the proposed models outperform baseline methods in both limited data settings and full data settings. Diese Dissertation entstand im Rahmen einer Kooperation mit der Southeast University, China. / This dissertation was written in the context of a cooperation with Southeast University, China.
Date created
2022
Subject headings
[GND]: Mensch-Maschine-Kommunikation[LCSH]: Regular expressions (Computer science) | Human-computer interaction
[Free subject headings]: Natural Language Understanding | Spoken Dialogue System | Deep Neural Networks
[DDC subject group]: DDC 004 / Data processing & computer science
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-46912
Ahmed, Waheed (2023): Combining Neural Networks with Knowledge for Spoken Dialogue Systems. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-46912
Citation formatter >