The embodiments described herein relate generally to replica files stored at replica nodes in a filesystem. More specifically, the embodiments described herein relate to recovering from replica node failure using the stored replica nodes.
A filesystem is a term of art to refer to structure and logic rules for managing data (i.e., files). Specifically, filesystems are used to control how data is stored, retrieved, and updated. One type of filesystem is a distributed storage system, where replicated versions of a file are stored on multiple replica nodes. Distributed storage systems are used in scenarios in which high-performance data analytics is required over large datasets. Key challenges for managing such “big data” workloads within a distributed storage system include low-overhead durability or persistence to enable fast runtime performance, and enabling lower downtime for the distributed storage system during recovery from a failure at a replica node.
Traditionally, filesystem durability is provided through the use of logs, which are referred to herein as journals. Journals are used to keep track of intended changes to a filesystem such that, in the event of a filesystem crash or other failure, the filesystem can be returned to proper operation quicker and with a lower likelihood of corruption. However, utilizing journals in a distributed storage system results in poor performance and slow recovery. This is primarily because each node of the distributed storage system introduces a set of additional writes and cache flushes for ordering and persisting updates to storage. However, without journaling, filesystem recovery generally requires a complete scan of the address space of the storage system. This increases downtime because the filesystem cannot be mounted until a successful reconstruction of filesystem metadata during a filesystem consistency check for verifying crash consistency of lost updates. Additionally, the replicas of the distributed storage system need to be reconciled with each other to account for version consistency across the replica nodes. Both crash and version consistency checks result in performance loss and slow recovery for distributed storage systems. Moreover, the additional writes reduce the lifetime for wearable storage devices such as flash or solid-state drive memory technology.