This application contains subject matter which is related to the subject matter of the following applications, which are assigned to the same assignee as this application and filed on the same day as this application. The below-listed applications are hereby incorporated herein by reference in their entirety:
xe2x80x9cDYNAMIC RECONFIGURATION OF A QUORUM GROUP OF PROCESSORS IN A DISTRIBUTED COMPUTING SYSTEMxe2x80x9d by Briskey et al., Ser. No. 09/387,666;
xe2x80x9cRECOVERY PROCEDURE FOR A DYNAMICALLY RECONFIGURED QUORUM GROUP OF PROCESSORS IN A DISTRIBUTED COMPUTING SYSTEMxe2x80x9d by Briskey et al., Ser. No. 09/387,185;
xe2x80x9cRELAXED QUORUM DETERMINATION FOR A QUORUM BASED OPERATIONxe2x80x9d by Briskey et al., Ser. No. 09/386,549.
This invention relates to distributed computing systems, and more particularly, to the dynamic reconfiguration of a quorum group of processors within a distributed computing system, and to a recovery procedure for one or more processors of the group which were unavailable during the dynamic reconfiguration.
Distributed computing systems employ a plurality of processing elements. These processing elements might be individual processors linked together in a network or a plurality of software instances operating concurrently in a coordinated environment. In the former case, the processors communicate with each other through a network which supports a network protocol. The protocol might be implemented by using a combination of hardware and software components. Processing elements typically communicate with each other by sending and receiving messages or packets through a common interface. One type of distributed computing system is a shared nothing distributed system wherein the processing elements do not share storage. Within such a system, the elements must exchange messages in order to agree on the state of the distributed system.
Thus, within a shared nothing distributed processing system, message exchange protocol is needed. For example, the message exchange protocol will seek to solve the problem of the current state of a database in the distributed processing system. Specifically, the protocol needs to define which processing element has the latest version of the database, since processing elements can create different database versions. As is well known, a high availability system allows one or more processing elements to become unavailable while the system continues to perform processing. Therefore, the database can be modified within a high availability distributed processing system while one or more processing elements are unavailable (e.g., off line). When a previously unavailable processing element becomes available, an updated version of the database must be provided to that processing element.
Conventional shared nothing distributed processing systems have the restriction that a group of processing elements participating in a quorum driven recovery must be static. That is, once a server group is defined members cannot be added or removed dynamically, i.e., while the database is running and one or more members are potentially unavailable. The only way to make a reconfiguration change in a conventional shared nothing distributed processing system is to use a redefine operation which requires a change to a configuration file in all servers of the system, and therefore requires that all servers be currently available for the reconfiguration change.
Notwithstanding the above, in the case of highly available distributed processing systems, such as database servers, it is deemed desirable to allow the addition or deletion of servers without requiring that all servers of a group of servers be available. The distributed server recovery procedure (DSRP) provided herein allows for this modification of the configuration of the server group requiring only that a majority (quorum) of the currently defined servers be available for the modification to proceed. For example, some servers may be unconfigured (excluded from the group) while they are down, and other servers may be added. The process of adding or deleting servers while one or more servers may be unavailable is referred to herein as xe2x80x9cdynamically reconfiguringxe2x80x9d the quorum group of processors. Again, the traditional procedures for recovery of distributed servers require a static configuration environment.
To summarize, a method is provided for determining quorum for a quorum based operation of a distributed computing system. This method includes: establishing a number of processors in a group and recording which of the processors are inactive and which are active to arrive at a number of inactive processors and a number of active processors; obtaining a result by subtracting from the number of processors in the group the number of inactive processors less one and comparing the result with a majority value of the number of processors in the group; and if the result is less than the majority value, then establishing the quorum as a majority number of the active processors less one, and otherwise establishing the quorum as the majority number of the active processors of the group.
System and computer program products corresponding to the above-summarized methods are also described and claimed herein.
To restate, provided herein is a reconfiguration capability for dynamically reconfiguring a quorum group of processors notwithstanding that one or more processors of the group may be unavailable, as well as a recovery procedure for implementation by the processors of the group when the one or more previously unavailable processors become available. By being able to dynamically reconfigure a group of processors while one or more of the processors are unavailable, a system administrator can ensure that critical systems are maintained even if one or more processors become unavailable, provided that a quorum of processors remains. The dynamical reconfiguration capabilities and recovery procedures described herein thus provide greater flexibility in a high availability, distributed computing environment. A relaxed quorum calculation is also presented for use with a quorum based operation, such as the recovery procedure described herein.