1. Statement of the Technical Field
The present invention relates to the field of resource locking and more particularly to quorum algorithms for managing resource locking in a clustered environment.
2. Description of the Related Art
The concurrent operation of multiple processes or services can introduce the problem of resource contention. Concurrent processes and services typically enjoy access to shared resources. Occasionally, individual processes and services require the exclusive use of selected resources for a period of time. Various mechanisms when applied are known to allow resources to become “locked” and “unlocked” by “cooperating concurrent processes.” In this regard, where a cycle exists in the acquisition of locks, a deadlock can occur.
A common method used to avoid “resource deadlock” includes the hierarchical ordering of shared resources. In consequence of this hierarchy, locks can be acquired in order of the hierarchy, while locks which have been acquired can be released in an opposite order. In any case, it will be recognized that resource locking is a method to grant access to shared resources. Moreover, locking a shared resource for the duration of a transaction will result in the suspension of access to the locked resource as applied to other processes or services.
Resource locking can be problematic in that the burden of managing access to shared resources can be placed squarely upon each concurrent process or service. To that end, where a service or process has not been pre-configured to lock a resource prior to its use, or if a lock is acquired in a non-hierarchical manner, then a potential deadlock condition may exist. Accordingly, it has been suggested that resource management can provide a conceptually better solution for solving the resource contention problem.
In the resource management model, rather than having a pool of resources visible to many concurrent processes, the resources can be gathered and managed by a single “resource manager.” Concurrent processes can request the resources from the manager, and the manager can arbitrate all resource requests to ensure fair allocation of the resources. Consequently, the arbitration of resource requests can be centralized within a single object instance. Moreover, program correctness can be guaranteed more easily. Nevertheless, although resource management can appear to be elegant from a conceptual level, programming language constraints can make resource management appear more difficult and error prone than mere resource locking.
To improve performance in a distributed system, often it can be preferred to share a locked resource among multiple processes or services. Additionally, locked resource sharing can be helpful in the circumstance where one might want to access a locked resource in multiple instances. Sharing a locked resource can be particularly important given the alternative possibility of creating and releasing a lock for each transaction with the resource. Moreover, to repeatedly create and release locks for each transaction with a resource can give rise to undesirable locking delays.
Though the problem of resource locking can be challenging to overcome in the conventional setting, the problem of resource locking can become compounded in the clustered environment. In a clustered environment, clustered systems share various data and system resources such as access to disks and files. To achieve the coordination that is necessary to maintain resource integrity, the cluster must have clear criteria for membership and must disallow participation in the cluster by systems that fail to meet the established criteria. To that end, an instance of a connection manager often can be included with a cluster to create the cluster, add and remove members to and from the cluster, track which members in the cluster are active, maintain a consistent cluster membership list, provide timely notification of changes to the membership, and detect and handle possible cluster partitions.
Typically, the connection manager can ensure data integrity in the face of communication failures by using a voting mechanism. The voting mechanism, referred to as a quorum algorithm, can permit processing and I/O within a cluster only when a majority of “votes” are present in the cluster. When the majority of votes are present, the cluster is said to have a quorum. The quorum algorithm, itself, can calculate a quorum based upon any number of factors, including for example, expected votes, current votes, node votes and quorum disk votes.
When resource locking is employed within the clustered environment, typically one cluster obtains a lock on a desired resource. Subsequently, transactions within the cluster requiring use of the locked resource can be routed to the locked resource as managed in the cluster. Still, when employing resource locking in the clustered environment, one must plan for network faults. For instance, when a cluster becomes partitioned, nodes formerly within the cluster which are no longer able to “see” other nodes across multiple partitions. Where one node acted as the locking resource prior to the partitioning, nodes in other partitions will not be able to resolve whether the lock has expired, or whether the cluster has been partitioned. Typically, a quorum algorithm can be applied in this circumstance.
While quorum algorithms have been successfully applied in respect to storage devices, such quorum algorithms usually depend upon low-level primitives associated with the storage devices. Thus, quorum algorithms heretofore have not been successfully applied to higher levels of the computing hierarchy such as a computing grid. As defined in exemplary fashion by the Open Grid Services Architecture (OGSA), a computing grid can provide protocols both in discovery and also in binding of Web services across distributed systems in a manner which would otherwise not be possible through the exclusive use of registries, directories and discovery protocols.
More specifically, as described both in Ian Foster, Carl Kesselman, and Steven Tuecke, The Anatomy of the Grid, Intl J. Supercomputer Applications (2001), and also in Ian Foster, Carl Kesselman, Jeffrey M. Nick and Steven Tuecke, The Physiology of the Grid, Globus.org (Jun. 22, 2002), a computing grid can provide distributed computing infrastructure through which grid services instances can be created, named and discovered by requesting clients. Grid services extend mere Web services by providing enhanced resource sharing and scheduling support, support for long-lived state commonly required by sophisticated distributed applications, as well as support for inter-enterprise collaborations. Moreover, while Web services alone address discovery and invocation of persistent services, grid services support transient service instances which can be created and destroyed dynamically.
Notable benefits of using grid services can include a reduced cost of ownership of information technology due to the more efficient utilization of computing resources, and an improvement in the ease of integrating various computing components. Thus, the grid mechanism, and in particular, a grid mechanism which conforms to the OGSA, can implement a service-oriented architecture through which a basis for distributed system integration can be provided—even across organizational domains. Nevertheless, the low-level primitives ordinarily associated with storage devices are not similarly associated with the computing grid. Thus, to date quorum algorithms useful for managing resource locking have been unavailable for use in the grid context.