The present invention relates generally to distributed computing systems, and specifically to treatment of configuration changes in clusters used in distributed computing applications.
Computer clusters are widely used to enable high availability of computing resources, coupled with the possibility of horizontal growth, at reduced cost by comparison with collections of independent systems. Clustering is also useful in disaster recovery. A wide range of clustering solutions are currently available, including 390 Sysplex, RS/6000 SP, HACMP, PC Netfinity and AS/400 Cluster, all offered by IBM Corporation, as well as Tandem Himalaya, Hewlett-Packard Mission Critical Server, Compaq TruCluster, Microsoft MSCS, NCR LifeKeeper and Sun Microsystems Project Cascade. An AS/400 Cluster, for example, supports up to 128 computing nodes, connected via any Internet Protocol (IP) network. A developer of a software application can define and use groups of physical computing entities (such as computing nodes or other devices) or logical computing entities (such as files or processes) to run the application within the cluster environment. In the context of the present patent application and in the claims, such entities are also referred to as group members, and the term xe2x80x9centityxe2x80x9d is used to refer interchangeably to physical and logical computing entities.
Distributed group communication systems (GCSS) enable applications to exchange messages within groups of cluster entities in a reliable, ordered manner. For example, the OS/400 operating system kernel for the above-mentioned AS/400 Cluster includes a GCS in the form of middleware for use by cluster applications. This GCS is described in an article by Goft et al., entitled xe2x80x9cThe AS/400 Cluster Engine: A Case Study,xe2x80x9d presented at the International Group Communications Conference IGCC 99 (Aizu, Japan, 1999), which is incorporated herein by reference. The GCS ensures that if a message addressed to the entire group is delivered to one of the group members, the message will also be delivered to all other live and connected members of the group, so that group members can act upon received messages and remain consistent with one another. A group member is considered to be xe2x80x9calivexe2x80x9d if it is functioning and able to perform a part in a distributed software application. Typically, xe2x80x9clivenessxe2x80x9d testing procedures are defined and applied by the GCS to determine which members are alive and which are not.
Another well-known GCS is xe2x80x9cEnsemble,xe2x80x9d which was developed at Cornell University, as were its predecessors, xe2x80x9cISISxe2x80x9d and xe2x80x9cHorus.xe2x80x9d Ensemble is described in the xe2x80x9cEnsemble Reference Manual,xe2x80x9d by Hayden (Cornell University, 1997), which is incorporated herein by reference.
A key function of the GCS is to inform software applications running on the computing group of the identities of the connected set of members in the group. Whenever the group configuration changes, due to one or more members leaving the group or new members joining, the GCS sends out a membership change message with a current, updated membership list. For example, the Ensemble system uses a class called Maestro_GroupMember, described at www.cs.cornell.edu/Info/Projects/Ensemble/Maestro/groud.htm to manage and distribute membership change messages. In this Ensemble class and in other systems known in the art, the form of the membership change message is the same whether the departing members have left the group voluntarily or due to a fault, such as a node crash or network failure. Similarly, such membership change messages contain no information as to the state of new group members and whether or not the new members have been members of this group in the past.
It is an object of some aspects of the present invention to provide improved methods and systems for enabling computer applications running on a cluster of participating entities to deal with membership changes in the cluster.
In preferred embodiments of the present invention, a group communication system (GCS), for use within a group of clustered computing entities, provides membership change messages to software applications running in the group. These messages not only identify which members have joined or left the group, but also indicate the reasons for the membership change. The reasons are typically gleaned by the GCS from various sources, such as network communication and topology layers, information provided by the members who join or leave the group, and diagnostics and control components of the GCS itself. Knowing the reasons for membership changes can be of crucial importance to many distributed applications, and particularly to cluster applications, such as database and cluster management applications, which must maintain a common state or require consistency among the group members.
Although preferred embodiments described herein are based on a GCS, it will be appreciated that the principles of the present invention may similarly be implemented in substantially any distributed computing environment in which there is a mechanism for keeping track of membership of entities in a computing group or cluster. As noted above, such entities may comprise either physical or logical entities.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for controlling operation of a computer software application running on a plurality of computing entities, which are members of a group of mutually-linked computing entities running the application within a distributed computing system, the method including:
receiving an indication of a change in membership of the group together with a reason for the change; and
delivering a membership change message to the members, so as to inform the members of the change and of the reason for the change.
Preferably, the indication is received by group communication system middleware, which delivers the membership change message to the members. Further preferably, receiving the indication of the change includes detecting a failure of the group communication system at a node in the distributed computing system.
Additionally or alternatively, receiving the indication of the change includes discovering a topology change in the distributed computing system, wherein discovering the topology change includes detecting a node in the system that has become available to run the application in the group. Preferably, detecting the node that has become available includes determining whether or not the node was previously separated from the group, and delivering the message includes informing the members as to whether or not the node previously belonged to the group.
Further additionally or alternatively, receiving the indication includes receiving notice of a communication failure in a network linking the computing entities or receiving notice of a failure of a node in the distributed computing system. Preferably, receiving the notice of the failure of the node includes receiving a report of a failure in a liveness check of the node.
Still further additionally or alternatively, receiving the indication includes receiving notice that a new member has joined the group or that one of the members has left the group voluntarily. Preferably, delivering the membership change message includes notifying the other members that the one of the members has left the group voluntarily.
Yet further additionally or alternatively, delivering the membership change message includes notifying the members that one or more members have left the group due to a specified failure in the system or that one or more members, previously separated from the group, have re-merged with the group.
Preferably, delivering the membership change message includes delivering substantially the same message to all of the members of the group, wherein substantially all of the members respond to the message in a mutually-consistent fashion.
There is also provided, in accordance with a preferred embodiment of the present invention, distributed computing apparatus, including:
a computer network; and
a group of computer nodes, mutually-linked by the network so as to run a computer software application, and adapted so that responsive to an indication received at one of the nodes of a change in membership of the group, a membership change message is delivered to the members via the network, informing the members of the change and of a reason for the change.
There is further provided, in accordance with a preferred embodiment of the present invention, a computer software product for controlling operation of an application running on a plurality of computing entities, which are members of a group of mutually-linked computing entities running the application within a distributed computing system, the product including a computer-readable medium in which computer program instructions are stored, which instructions, when read by the computing entities, cause at least one of the entities to receive an indication of a change in membership of the group together with a reason for the change, and to deliver a membership change message to the members, so as to inform the members of the change and of the reason for the change.