1. Field of the Invention
The present invention relates generally to multiprocessing systems. More particularly, the invention relates to the arbitration of access among multiple competing processing nodes to a shared resource by conducting a membership protocol among all nodes of the system including the shared resource, where the shared resource subsequently fences nodes outside its membership view.
2. Description of Related Art
Multiprocessing computing systems perform a single task using a plurality of processing xe2x80x9celementsxe2x80x9d, also called xe2x80x9cnodesxe2x80x9d, xe2x80x9cparticipantsxe2x80x9d, or xe2x80x9cmembersxe2x80x9d. The processing elements may comprise multiple individual processors linked in a network, or a plurality of software processes or threads operating concurrently in a coordinated environment. In a network configuration, the processors communicate with each other through a network that supports a network protocol. This protocol may be implemented using a combination of hardware and software components. In a coordinated software environment, the software processes are logically connected together through some communication medium such as an Ethernet network. Whether implemented in hardware, software, or a combination of both, the individual elements of the network are referred to individually as members, and together as a group.
Frequently, the nodes of a multiprocessing system commonly access a xe2x80x9cshared resourcexe2x80x9d. As an example, the common resource may comprise a storage device, such as a magnetic xe2x80x9chardxe2x80x9d disk drive, tape drive or library, optical drive or library, etc. Resources may be shared for a number of different reasons, such as avoiding the expense of providing separate resources for each node, guaranteeing data consistency, etc.
FIG. 1A shows a multiprocessing system 100 where multiple processing nodes 102-104 have common access to a shared resource 106. The processing nodes 102-104 and shared resource are interconnected by communications paths 108-112. A problem arises when communications between the nodes 102-104 is interrupted, for example, due to failure of the communications path 108. This problem concerns the nodes"" competing access to the resource 106, possibly resulting in extremely inefficient operation of the system 100.
In the absence of any scheme for arbitrating disputes between the incommunicant nodes 102-104, the system 100 may experience xe2x80x9cthrashingxe2x80x9d back and forth between the nodes 102-104, each node successively fencing the other node from resource access. This situation is undesirable, chiefly due to the inefficient time each node spends vying for access to the resource 106 rather than computing or actually accessing the resource 106.
Another approach to address the failure of the communications path 108 is to designate one of the nodes 102-104, in advance, to be master of the resource 106 in the event of a resource failure. This way, at least the active node will enjoy hassle-free access to the shared resource 106. However, the second node is completely blocked from accessing the resource 106. And, if the active node fails, then use of the resource 106 is absolutely frustrated.
Still another approach to failure of the communications path 108 is for the nodes 102-104 to communicate via the resource 106. For some users, this approach may be too inefficient, because communications between the nodes 102-104 occupies communications bandwidth otherwise used to exchange data with the shared resource 106. Furthermore, the nodes 102-104 are encumbered with additional overhead required for fault detection and resource control.
Consequently, due to certain unsolved problems, known communications recovery schemes are not completely adequate for some applications such as those with shared resources.
Broadly, the invention concerns a multiprocessing system that arbitrates access among multiple competing processing nodes to a shared resource by conducting a membership protocol among all nodes of the system including the shared resource, where the shared resource subsequently fences nodes outside its membership view. To determine the shared resource""s membership view, active nodes repeatedly subscribe to the shared resource during prescribed membership intervals. From these subscriptions, an output membership view is generated for the shared resource. The membership protocol for the passive node ultimately ends when the membership view meets a termination condition guaranteeing asymmetric safety.
More specifically, in one embodiment a method is provided to determine access among multiple active nodes to a passive node in a multiprocessing system, with a communications network interconnecting the passive node and the active nodes. First, one of the nodes makes a membership protocol announcement. Responsive to the membership protocol announcement, a timer is started to expire after a fixed time. The time between starting and expiration of the timer defines a current membership interval.
Also responsive to the membership protocol announcement, each active node commences attempts at inter-nodal communications to identify all other nodes with which communication has not failed. All nodes so identified comprise a membership view. Further responsive to the membership protocol announcement, each active node commences an attempt to submit a subscription message to the passive node.
Subsequently, the timer expires, thereby closing the current membership interval. In response to the timer expiring, each active node establishes its membership view, made up of all other nodes identified during the current membership interval. Also established is the passive node""s membership view, comprising all active nodes successfully submitting a subscription message during the current membership interval. The membership views of all nodes are integrated, using asymmetric safety, to establish an updated membership view of each node. Subsequent access to the passive node is then restricted according to the passive node""s updated membership view.
The invention also includes another embodiment of coordinating access to shared resources in a multiprocessing system with multiple nodes subject to communications and node failures. The present invention prescribes that when communication or nodes failures are suspected, coordination problems be resolved by having each node, including nodes representing shared resources, participate in a membership protocol that provides asymmetric safety. For simplicity the present invention will be described in terms of methods that apply to a multiple node system containing one shared resource node. It will be obvious to one skilled in the art how to extend these methods to apply to multiple shared resource nodes.
One exemplary approach chooses a leader node among the nodes contending for the shared resource node. Depending on the access needs, the leader node may then have exclusive access to the shared resource node or the leader node may control the access of others, for example by maintaining a lock table for the shared resource node.
In one embodiment a method is provided to choose a new leader when it is suspected that the previous leader is no longer functioning properly or no longer able to access the shared resource node. Responsive to some indication that the previous leader may have failed (such as the timeout of a message requesting a response from the leader, or any such indication from any failure detection mechanism), a node may invoke a membership protocol that provides asymmetric safety. The participants in this membership protocol are all the nodes that can potentially access the shared resource and the shared resource node, itself. On completion of the membership protocol, if a regular (non shared resource) node finds that the shared resource node is not in its new membership view, the regular node attempts to rejoin the shared resource node; otherwise, after ascertaining that the shared resource node has completed the membership protocol, the regular node computes the identity of the new leader based on its local membership view, using a preselected one of many available policies for such selection (e.g. choose the first member in lexicographic order of id, or choose the old leader if it is still in the membership view or the next member after the old leader in lexicographic order of id, etc.). As soon as it has identified itself as the new leader, a regular node may begin acting in its capacity as leader. On completion of the membership protocol, a shared resource node fences all nodes not in its new membership view, preventing these nodes from accessing all but a special membership processing area of the shared resource.
Since a shared resource node may not always be able to perform all the functions required by the membership protocols referenced above, in another embodiment a new set of membership protocols is provided to function as part of the method described above. Each membership protocol described herein therefore has two counterpart membership protocols, one performed by active nodes that can perform all the functions required in the original protocol, and one performed by a passive node with a much more restricted repertoire.
Accordingly, one aspect of the invention is a method of coordinating access to a shared resource in a multiprocessing system. In contrast, a different embodiment of the invention may be implemented to provide an apparatus such as a multiprocessing system, configured to coordinate access to a shared resource among multiple processing nodes. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform method steps for coordinating access to a shared resource in a multiprocessing system.
The invention affords its users with a number of distinct advantages. Advantageously, the invention determines access to shared resources in a multiprocessing system using a membership protocol that achieves a non-blocking termination in a fixed amount of time. Even with crash failures, this approach accurately determines membership within a fixed finite time after crash detection. Furthermore, this approach imposes a minimal burden on the normal operation of the shared resource, leaving as much communications bandwidth as possible for the shared resource to conduct normal communications with the active nodes. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.