Novel methods for text preprocessing and classification
Dissertation
Faculties
Fakultät für Ingenieurwissenschaften und InformatikAbstract
Written text is a form of communication that represents language (speech) using signs and symbols. For a given language text depends on the same structures as speech (vocabulary, grammar and semantics) and the structured system of signs and symbols (formal alphabet).
Written text has always been an instrument of exchanging information, recording history, spreading knowledge, maintaining financial accounts and formation of legal systems.
With the development of computers and Internet the amount of textual information in digital form has dramatically grown. There is an increasing need to automatically process this information for variety of tasks related to text processing such as information retrieval, machine translation, question answering, topic categorization and topic segmentation, sentiment analysis etc. Many important text processing tasks fall into the field of text classification.
This thesis addresses the development and evaluation of novel text preprocessing methods, which combine supervised and unsupervised learning models in order to reduce dimensionality of the feature space and improve the classification performance. Metaheuristic approaches for Support Vector Machine and Artificial Neural Network generation and parameters optimization are modified and applied for text classification and compared with other state-of-the-art methods using different text representations.
Date created
2015
Subject headings
[GND]: Automatische Klassifikation[LCSH]: Text processing (Computer science)
[Free subject headings]: Text classification | Text preprocessing
[DDC subject group]: DDC 000 / Computer science, information & general works
Metadata
Show full item recordDOI & citation
Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-3242
Gasanova, Tatiana (2015): Novel methods for text preprocessing and classification. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-3242
Citation formatter >