Distributed file systems offer many compelling advantages in establishing high performance computing environments. One example is the ability to easily expand, even at large scale. Another example is the ability to store different types of data, accessible by different types of clients, using different protocols. A distributed file system can operate on a cluster of nodes, allowing clients to connect to any node of the cluster to access and/or modify data residing on any node of the cluster.
Events within the file system can create a new file, change permissions on a file, change the metadata of a file, etc. These events can happen in parallel and can be processed on different nodes of the cluster of nodes. The distributed file system may have in place locking mechanisms that prevent access to a file while it is being modified by another user or file system process. However, as events are processed that change the metadata on files, it is desirable to know a causal order of the actions that have been processed, no matter which node is processing the individual events.
One method to order events could be to use timestamps and associate them with each event. For example, you could place a timestamp on the event of creating a file on Node A and place a second timestamp when changing permissions of the same file on Node B. While in this example, you cannot change permissions on a file before it has been created, if the events were processed in a very small time window, clocks that are not perfectly synced between nodes could introduce enough error to produce inconsistent results.
In another example, even if the timestamps of operations was perfectly in sync amongst nodes of the cluster of nodes, the chronological ordering of events may be less important and have less value than a causal order of events. For example, in a distributed file system, some events that are processed in a chronological order may have a different causal order. For example, in a non-native Hadoop Distributed File System (“HDFS”) environment, operations can be processed in a chronological ordering which a traditional HDFS environment would not allow, due to how data is organized and processed in each respective environment. Therefore, there exists a need to guarantee a causal order of events in a distributed file system.