Fehlertoleranz in einem verteilten Betriebssystem mit transaktionalem Speicher
FacultiesFakultät für Ingenieurwissenschaften und Informatik
Creating checkpoints of a distributed cluster operating system is a non-trivial task, as special coordination between cluster nodes is necessary to cope with the domino effect. This additional coordination effort of distributed checkpointing can be greatly reduced by using distributed transactional memory as communication medium between cluster nodes. Based on Rainbow OS, a 64 bit cluster operating system, this work presents an approach to efficient distributed checkpointing which can be executed with minimal impact and concurrent to cluster operations. Checkpointing data is saved on solid state drive (SSD) storage using a specifically tailored algorithm to maximize checkpointing performance and still provide object-oriented access. This work further offers an approach to include device driver data in checkpoints and to consistently restore it in case of system failure with subsequent fallback to a stored checkpoint.
Subject HeadingsFehlertoleranz [GND]
Verteiltes Betriebssystem [GND]
Electronic data processing; Distributed processing [LCSH]
Fault-tolerant computing [LCSH]
Operating systems (Computers) [LCSH]