Distributed file systems offer many compelling advantages in establishing high performance computing environments. One example is the ability to easily expand, even at large scale. An example distributed file system is one that is distributed across multiple nodes in a cluster of nodes. An individual node can encompass a set of storage drives capable of storing data accessible by clients of the clusters of nodes. In some distributed file systems, files or objects can striped across different physical storage devices in a single node or across multiple nodes of a cluster. With multiple clients capable of interacting and requesting operations on files to multiple nodes, many times there are operations that are executed by multiple threads, processes and applications distributed across one or more nodes. For example, more than one thread or process may demand to concurrently write data to the same file in a file system.
File system locks can be used that allow one client to access data under locking conditions that prevent another client from performing certain operations on the locked data. For example, readers usually use locks that prevent the file from changing while the file is being read. And, because readers will not change a file there may be multiple readers of a file each holding a lock preventing the file from being altered during while being read. Writers usually use locks that are exclusive so the writer can alter the file without being concerned about the actions of other writers or readers. Accordingly, if writers attempt to write to a file, writers must wait until other lock-holders (e.g., readers or writers) have finished with the file and have released their locks. If all locks are released a writer can then obtain its own exclusive lock for writing to the file.
In a distributed file system, such as a cluster of nodes, file system operations can be viewed as multi-layered. The first layer decodes what the operation is trying to accomplish, including assessing which nodes, including the node processing the operation, among the cluster of nodes are storing data that the operation is dependent on. As the operation progresses, a journal can be used as a means to provide more guarantees for requested operations, by first having the operations logged into a journal, (e.g., an in-memory storage space such as a cache or a buffer cache), and then later committing the operations logged in the journal to stable disk storage. Most entries in a journal involve either file data blocks or metadata blocks. For some journal entries, such as those relating to metadata blocks, a journal entry only relates to a small portion or a small set of sub-blocks of a metadata block. However, in order to process journal entries that affect a set of sub-blocks, for example an order dependent operation, a lock may have to be used on the entire block prohibiting other operations from being processed on sub block unrelated to the journal entry in question. Thus, there exists a need to support concurrent compatible data operations on different, non-overlapping ranges of a single journal block.