Modern computers are often networked together and share resources such as storage resources (e.g., disks). By sharing storage resources, each networked computer can store data on any storage resource of the network. One way to network computers is to cluster them, which forms a clustered network of nodes (i.e., computers). An example of clustered networks is described in “Chapter 3: VAX Clusters and Other High-Availability Systems”, VAXclusters Architecture. Programming and Management, Shah, McGraw-Hill, Inc., 1991.
In networked computers, a system of protocol (called a file system) manages and controls accesses to storage resources (e.g., writing or retrieving data) to preserve the integrity of stored data. An example file system is a cluster file system, used in clustered networks, to write and retrieve data. One feature of a cluster file system is that each node makes direct accesses to storage resources. In other words, no one node functions as a server responsible for managing storage resources, and each node views all storage resources as essentially locally accessible resources. A cluster file system typically includes a distributed lock manager (DLM) for coordinating file system accesses among nodes. Example conventional DLMs are described in “Chapter 13: VAX/VMS Lock Manager,” VAX/VMS Internals and Data Structures, Kenah et al., Digital Press, 1984 and Chapters 4 and 5 of Oracle 8i Internal Services for Waits, Latches, Locks and Memory, O'Reilly & Associates.
In a conventional DLM, for a node (the “lock-requesting node”) to access a particular storage resource, the node first determines if another node holds a lock for the same storage resource (the “lock-holding node”). If there is a lock-holding node for the storage resource, then the lock-requesting node sends a request to the lock-holding node to access the storage resource.
Upon receiving the request, the lock-holding node completes its task (if any) of accessing the storage resource and releases the lock. For instance, if the lock-holding node is writing a block of data to the storage resource at the time the request is received, the lock-holding node must complete that task. Subsequently, the lock is released and transferred to the lock-requesting node. These steps cause the conventional DLM to incur some administrative costs, which include, for example, flushing the cache memory and/or the journal of the lock-holding node. The latter example entails, e.g., performing operations and their dependent operations required by the journal, and writing success markers for the required operations after performing them.
The conventional DLM performs steps of releasing and transferring locks, thereby incurring the administrative costs, even if the lock-holding node would need to access the storage resource soon after the lock is transferred to the lock-requesting node. These steps are also performed even if the lock-requesting node plans to write only a small amount of data on the storage resource. Hence, the conventional DLM is inefficient because, among other things, it may require a lock-holding node to release its lock without considering whether the lock-holding node would request the lock soon after releasing it or whether only a small amount of data would be written by a lock-requesting node.