Algorithms and a software system for comparative genome analysis
FacultiesFakultät für Informatik
LicenseStandard (Fassung vom 03.05.2003)
Comparative genomics is a relatively new field of bioinformatics concerned with comparing the genomic sequences of different organisms to each other. The results of these comparisons are important to understand how genomes function, organize, and evolve. Comparing whole genomes is basically to find the regions of similarity and difference among the given genomic sequences. This is a very complicated task due to the shear volume of data and due to the structure of the genomic sequences. The thesis at hand presents algorithms and data structures for comparing two or multiple genomes based on the anchor-based strategy. Our contribution does not only increase the efficiency of this strategy, but also extends its range of application to various comparative genomics tasks. More precisely, we introduced a novel indexing data structure, which we call the "enhanced suffix array". The use of this data structure tremendously speeds up the analysis of the given genomic sequences. Moreover, we introduced chaining algorithms that are used to determine the similar regions among the genomic sequences. Our algorithms are integrated in a software system called CHAINER. We have shown, using real biological data, how to use this system for a number of comparative genomics tasks. Our system made it possible to compare large genomic sequences in few hours, and even few minutes for small bacterial genomes, using reasonable computational resources.
Subject HeadingsGenanalyse [GND]
Verkettung <Informatik> [GND]
Genomics. Data processing [LCSH]
Sequence analysis [MeSH]