In certain approaches to implementing a file system, operating system, distributed file system or database (hereinafter collectively referred to for convenience as a “file system”), metadata may be employed to represent the state of the name-space for data objects, including files, logical volumes, and other such objects (hereinafter collectively referred to for convenience as “files”), stored in a storage system associated with the file system. A storage system is a collection of storage devices, with a controller. This metadata can include mapping information that maps the various parts of a data object to the physical locations of those parts, as stored in the storage devices that comprise the storage system. An example of this type of storage system is disclosed in U.S. patent application Ser. No. 09/177,916, filed on Oct. 23, 1998, which is hereby incorporated by reference in its entirety. Many file systems contain multiple distributed nodes, with each node being a discrete sub-system that interacts with the storage system.
Each file is associated with metadata that identifies which storage device and locations on the storage device contain the various parts of the file, and where the parts of the file are located on the device. This metadata is typically stored on the storage system. File system nodes will from time to time access the data pointed to by the metadata. In order to access the data more efficiently, the nodes will cache local copies of the metadata. A problem arises, however, when the storage data is moved from one location within the storage system to another. All cached copies of the metadata must be updated with the new location of the data, to allow proper access to the data. One approach to informing the file system nodes of the changes to the metadata involves messages sent among all the nodes having copies of the metadata. The messages either update the file system nodes with the new location of the data in the storage system, or merely inform the node that data pointed to by the cached metadata is no longer valid. In the latter case, the node is responsible for updating the cached metadata from primary metadata associated with the data;
Unfortunately, using messages for updating/informing the cached copies of the metadata location is relatively expensive and prevents the file system from achieving optimum performance. The volume of messages being transmitted in a file system with a large number of nodes quickly becomes large enough to significantly impact overall performance. Every node containing a cached copy of the metadata pointing to relocated or deleted data, or in some systems every node in the cluster, is updated. Therefore many unnecessary messages are sent out to nodes that wind up discarding the updated metadata before they ever use it. Furthermore, messages can be missed if a new node starts up during the moving process, after the message was sent out but before the primary metadata was updated to reflect the new location.
Another common concern in distributed file systems is called the “split-brain syndrome”. Split-brain syndrome is a condition resulting from an error condition that causes communications between one or more nodes in a cluster of nodes sharing storage devices, to be lost. Any operational node in the cluster has no easy way of determining if a node with which it cannot communicate has crashed or is still operating. When this condition occurs, there isn't a secure way for the nodes to serialize file system metadata updates, because global messaging capability is lost. Thus, there is the danger that two processes will attempt to update the metadata associated with a particular allocation unit simultaneously, causing data corruption. Various hardware extensions to storage devices have been proposed that permit inhibiting access to a device by a node. However, these mechanisms restrict all access to the device, from all processes operating on the failing node, not just the processes related to the failed application in the cluster.