• English
    • Deutsch
  • English 
    • English
    • Deutsch
  • Login
View Item 
  •   Home
  • Universität Ulm
  • Publikationen
  • View Item
  •   Home
  • Universität Ulm
  • Publikationen
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Compressed suffix trees: design, construction, and applications

Thumbnail
vts_7786_11228.pdf (2.060Mb)
170 S.
Veröffentlichung
2011-12-05
Authors
Gog, Simon
Dissertation


Faculties
Fakultät für Ingenieurwissenschaften und Informatik
Abstract
In sequence analysis it is often advantageous to build an index data structure for large texts, as many tasks -- for instance repeated pattern matching -- can then be solved in optimal time. The most popular and important index data structure for many years was the suffix tree, which was proposed in the early 1970s by Weiner and was later mproved by McCreight. Gusfield showed in his textbook the power of the suffix tree by presenting over 20 problems which can be solved in optimal time complexity with ease by using the suffix tree. Despite its good properties -- the suffix tree can be constructed in linear time for a text over a constant size alphabet, and most operations can be performed in constant time -- it has a severe drawback. Its space consumption in practice is at least 17 times the text size. Therefore compressed suffix trees (CSTs) were proposed. In theory CSTs use at most two times the text size and provide all functionality of its uncompressed counterpart in essentially the same time complexity. In this work we focus on the practical implementation of CSTs, but also provide new theoretical suggestions for CST components. The presented design makes it possible to get myriads of CSTs with different time-space trade-offs. We show in an experimental study that the space of our CSTs is close to the theoretically expected space, and the runtime of the operations is faster -- sometimes by orders of magnitudes -- than the runtime of previous implementations. We have also improved the runtime of the construction process by an order of magnitude by using state of the art construction algorithms and a new cache-friendly algorithm for the construction of the longest common prefix array. Finally, we show that our CST implementation can replace the use of uncompressed suffix trees in some applications.
Date created
2011
Subject headings
[GND]: Algorithmus | Datenkompression | Datenstruktur
[LCSH]: Algorithms | Data compression: Computer science | Data structures: Computer science | Indexing
[Free subject headings]: Succinct data structures
[DDC subject group]: DDC 004 / Data processing & computer science
License
Standard (Fassung vom 01.10.2008)
https://oparu.uni-ulm.de/xmlui/license_v2

Metadata
Show full item record

DOI & citation

Please use this identifier to cite or link to this item: http://dx.doi.org/10.18725/OPARU-3911

Gog, Simon (2011): Compressed suffix trees: design, construction, and applications. Open Access Repositorium der Universität Ulm und Technischen Hochschule Ulm. Dissertation. http://dx.doi.org/10.18725/OPARU-3911
Citation formatter >



Policy | kiz service OPARU | Contact Us
Impressum | Privacy statement
 

 

Advanced Search

Browse

All of OPARUCommunities & CollectionsPersonsInstitutionsPublication typesUlm SerialsDewey Decimal ClassesEU projects UlmDFG projects UlmOther projects Ulm

My Account

LoginRegister

Statistics

View Usage Statistics

Policy | kiz service OPARU | Contact Us
Impressum | Privacy statement