A clustered file system (CFS) is a file system that can be mounted and accessed by multiple client nodes (such as multiple host systems in a cluster) concurrently. As part of providing concurrent access, a CFS needs to ensure that each client node has a consistent view of the file system's metadata. Examples of such file system metadata include, e.g., file names, directory information, file and directory attributes, and so on.
Existing CFSs generally guarantee metadata consistency via a process known as journaling. Journaling consists of two phases: a commit phase and a replay phase. During the commit phase, the CFS processes an update to a file system metadata resource by applying the update to a version of the resource in memory (e.g., in system RAM) and then writing the modified in-memory resource to an on-disk journal. During the replay phase, the CFS propagates the metadata resource recorded in the journal to the actual location(s) of that resource on disk. With this two-phase approach, the CFS can ensure that the on-disk version of the metadata resource remains in, or can be restored to, a consistent state in various system or network failure scenarios.
While the commit and replay phases are occurring, the CFS locks the metadata resource using an on-disk lock so that it cannot be accessed by clients during that period. To minimize the amount of time that the metadata resource is locked, existing CFSs typically perform the commit and replay phases synchronously (i.e., the replay immediately after the commit). Once the replay phase is finished, the CFS returns an “I/O complete” acknowledgement to the client application that initiated the I/O causing the metadata update. Unfortunately, this means that the initiating application must wait for both phases to complete before moving on with its program execution, which increases the latency of its I/O operations and reduces the overall application performance.