The present invention relates to the operation of process groups by a group of nodes in a network of processing nodes, and more particularly relates to the serializing of actions of independent process groups by a group of nodes in a network of processing nodes.
When certain events occur in a processing node, such as the failure of a process or the failure of a node, it is desirable to serialize the actions of the process group in response to the event. In a parallel environment, each node in an assigned group of nodes has processes for processing data which may be replicated in other nodes. If the same process exists on more than one node, the common processes are referred to as a process group, and the individual process in each node is referred to as a process group member, or a member, or a process. If one node in the group fails, the other nodes in the group are notified such that the other nodes may pick up the work of the failing node, or otherwise accommodate the failure such that processing may continue.
In the past, when one node of a group failed, all process groups were forced to recover. Is some cases, an order was established such that one process recovered before a dependent process recovered. One such system is disclosed in "High Availability Mechanisms of VAX DBMS Software".
In some past systems, a distinction could not be made between a process failure and a node failure.