Lexical and language modeling for Russian large vocabulary continuous speech recognition
FacultiesFakultät für Ingenieurwissenschaften und Informatik
This thesis outlines novel approaches for improving Russian large vocabulary continuous speech recognition. There are several peculiarities of Russian, which cause serious challenges for speech recognition process. The most severe problems are tackled in the scope of this work. First of all, phonetic transcriptions of Russian words depend strongly on the position of emphasized vowels. However, there are no rules for their localization. Therefore, two different methods were suggested to overcome this problem. Secondly, the non-trivial Russian grammar sophisticates tremendously the process of text normalization essential for language modeling. While being normalized, the majority of numerals and abbreviations should be declined according to a proper grammatical case. However, due to a very loose word order in Russian sentences it becomes a very challenging task. Since no solutions with satisfactory functionality were available to the best of our knowledge, we designed and implemented an advanced tool for Russian text normalization from scratch and made it publicly available. Thirdly, Russian is a highly inflected language with a complex mechanism of word formation. Therefore, an abundant lexicon is required in order to recognize fluent spoken utterances of any broad domain. The main part of our investigation is devoted to this problem as the most challenging and extensive one. Hybrid sub-word lexical and language models were utilized. Several important sub-word modeling parameters were under investigation: unit type, their optimal amount and size. Moreover, three algorithms for the joining of small elements were proposed and evaluated. One of the most important proposals of this thesis is the employment of double-sided marking for sub-word units. The majority of the suggested approaches are theoretically applicable not only for Russian, but also for all synthetic languages with highly inflected nature, for example, other Slavic languages.
Subject HeadingsAutomatische Spracherkennung [GND]
Russian language; Data processing [LCSH]
Speech perception [LCSH]