The present invention relates generally to distributed computing systems, and specifically to management of communications among computing entities in distributed computing applications.
Computer clusters are widely used to enable high availability of computing resources, coupled with the possibility of horizontal growth, at reduced cost by comparison with collections of independent systems. Clustering is also useful in disaster recovery. A wide range of clustering solutions are currently available, including 390 Sysplex, RS/6000 SP, HACMP, PC Netfinity and AS/400 Cluster, all offered by IBM Corporation, as well as Tandem Himalaya, Hewlett-Packard Mission Critical Server, Compaq TruCluster, Microsoft MSCS, NCR LifeKeeper and Sun Microsystems Project Cascade. An AS/400 Cluster, for example, supports up to 128 computing nodes, connected via any Internet Protocol (IP) network. A developer of a software application can define and use groups of physical computing entities (such as computing nodes or other devices) or logical computing entities (such as files or processes) to run the application within the cluster environment. In the context of the present patent application and in the claims, such entities are also referred to as group members.
Distributed group communication systems (GCSs) enable applications to exchange messages reliably within a group of entities in a cluster. For example, the OS/400 operating system kernel for the above-mentioned S/400 Cluster includes a GCS in the form of middleware for use by cluster applications. This GCS is described in an article by Goft et al., entitled xe2x80x9cThe AS/400 Cluster Engine: A Case Study,xe2x80x9d presented at the International Group Communications Conference IGCC 99 (Aizu, Japan, 1999), which is incorporated herein by reference. The GCS allows messages to be multicast to all of the members of a group and assigns a uniform ordering to all of the broadcast messages. Failure to deliver a multicast message to one or more of the members, or even delivery out of order, can cause failures and bugs in software applications running in the cluster. To avoid such problems, the GCS ensures that if a multicast message is delivered to one of the group members, it will also be delivered to all other live and connected members of the group in the same order.
Other Group Communication Systems share this feature of uniform ordering of multicast messages among the group members. One example is xe2x80x9cEnsemble,xe2x80x9d a GCS that was developed at Cornell University, as were its predecessors xe2x80x9cISISxe2x80x9d and xe2x80x9cHorus.xe2x80x9d Ensemble is described in the xe2x80x9cEnsemble Reference Manual,xe2x80x9d by Hayden (Cornell University, 1997), which is incorporated herein by reference. Another example is the IBM Phoenix system, described in U.S. Pat. No. 5,748,958, whose disclosure is likewise incorporated herein by reference.
Some of the names mentioned herein are trademarks of their respective owners.
It is an object of some aspects of the present invention to provide methods for convenient and reliable distribution of messages to sub-groups in a group computing environment.
It is a further object of some aspects of the present invention to provide GCS middleware offering ordered sub-group messaging capability.
In preferred embodiments of the present invention, a group communication system (GCS) for use in a group of computing entities allows a software application to define sub-groups that contain subsets of the members of the group. The GCS enables the application not only to convey multicast messages to all of the members of the group, but also to distribute sub-group messages to and among the sub-group members in a manner analogous to the distribution of the full-group multicast messages. The sub-group messages are uniformly ordered with respect to one another and to the full-group multicast messages.
The present invention thus overcomes a limitation of group communication systems known in the art, such as the above-mentioned AS/400 Cluster Engine and Ensemble, which enable developers of group computing applications to define multicast and point-to-point messages, but have no mechanism for handling sub-group messaging. In these existing systems, sub-group messages may be sent as multicast messages to all of the group members, but this method is wasteful of computing resources. Alternatively, point-to-point messages may be sent to all of the members who are identified as belonging to a given sub-group, but managing these messages at the application level is complicated and prone to error, particularly in terms of maintaining correct message ordering. Although it is theoretically also possible to define each sub-group as a new group, with its own message ordering, this solution raises the even more difficult problem of guaranteeing relative ordering between the messages in the new group and the original, larger group.
In some preferred embodiments of the present invention, the GCS is provided to the software application developer as a middleware package. The package preferably includes an application program interface (API), which enables sub-group messages to be defined simply and conveniently. Messages to be conveyed by the software application between group members are processed by a GCS protocol layer. This GCS layer interacts with lower network communication layers linking the members, such as IP (Internet Protocol) and UDP (User Datagram Protocol) layers, and assigns a uniform ordering to all multicast messages.
The GCS layer also includes a filter layer (or sub-layer), which receives a target list of sub-group members who are to receive each sub-group message and inserts an identification of the target sub-group in the message. Such sub-group messages are transmitted by the GCS using its normal, ordered multicast mechanism. When the sub-ground message reaches each of the members, the respective filter layer passes the message on to the member for processing only if the member is on the target list inserted in the message. In this manner, proper message ordering is maintained, both within the sub-group and with respect to general multicast messages. Meanwhile, the members who are not sub-group members, and are therefore not required to take any action on the sub-ground messages, are relieved of the overhead of processing them.
Although preferred embodiments described herein are based on a GCS, it will be appreciated that the principles of the present invention may similarly be implemented in substantially any distributed computing environment in which there is a mechanism for ordered conveyance of multicast messages in a computing group or cluster.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for distributing messages among a group of member computing entities, which are mutually-linked in a distributed computing system and communicate in accordance with a communication protocol that delivers a sequence of full-group multicast messages to all of the members in the group in an order that is uniform among all of the members, the method including:
defining a sub-group from among the members in the group; and
distributing a sub-group message to the members in the sub-group, such that the sub-group message is delivered to all of the sub-group members in a uniform position with respect to the order of the full-group multicast messages.
Preferably, the communication protocol includes group communication system middleware, to which the members of the group convey the full-group multicast messages to be delivered, and which assigns the uniform order to the full-group multicast messages.
Further preferably, distributing the sub-group message includes adding an identification of the sub-group to a corresponding one of the full-group multicast messages conveyed to the group communication system for delivery to all of the members of the group, and including filtering the corresponding full-group multicast message upon delivery to the members responsive to the identification of the sub-group. Most preferably, the sub-group message is included as a full-group multicast message in the uniform order assigned by the group communication system. Alternatively or additionally, filtering the corresponding full-group multicast messages includes, for each of the members, discarding the sub-group message if the member does not belong to the identified sub-group.
Preferably, the group communication system middleware includes a filter sub-layer, added to a group communication protocol layer, which adds the identification of the sub-group to the message when one of the group members sends the sub-group message and which filters the sub-group message upon its delivery to the group members. Most preferably, defining the sub-group includes receiving a target list of the sub-group members, and wherein adding the identification includes adding an indication of the target list to a header of the corresponding full-group multicast message.
Preferably, distributing the sub-group message includes distributing a plurality of sub-group messages, each having a respective, uniform position with respect to the order of the full-group multicast messages and with respect to the other sub-group messages. In a preferred embodiment, defining the sub-group includes defining multiple sub-groups within the group, such that at least some of the different ones of the plurality of sub-group messages are directed to different, respective sub-groups.
In another preferred embodiment, distributing the sub-group message includes delivering a full version of the message to the members of the sub-group and delivering a placeholder message to the members that are not in the sub-group so as to maintain the uniform position of the sub-group message with respect to the order of the full-group multicast messages.
In still another preferred embodiment, one of the group members is assigned to be a leader of the group, and distributing the sub-group message includes sending the message to the leader and receiving an order message from the leader, responsive to which the uniform position of the sub-group message is maintained with respect to the order of the full-group multicast messages.
There is also provided, in accordance with a preferred embodiment of the present invention, distributed computing apparatus, including:
a computer network; and
a group of computer nodes, mutually-linked by the network using a communication protocol that delivers a sequence of full-group multicast messages to all of the nodes in the group in an order that is uniform among all of the nodes, wherein the protocol is configured to accept a definition of a sub-group of the nodes and to distribute a sub-group message to the nodes in the sub-ground, such that the sub-group message is delivered to all of the nodes in the sub-group in a uniform position with respect to the order of the full-group multicast messages.
In a preferred embodiment, a subnet of the computer network is defined corresponding to the definition of the sub-group.
There is further provided, in accordance with a preferred embodiment of the present invention, a computer software product for distributing messages among a group of member computing entities, which are mutually-linked in a distributed computing system, the product including a computer-readable medium in which computer program instructions are stored, which instructions, when read by the member computing entities, cause the entities to carry out a communication protocol such that a sequence of full-group multicast messages sent by one or more of the member entities are delivered to all of the members in the group in an order that is uniform among all of the members, and further enable an application running on at least one of the entities to define a sub-group from among the members in the group and to distribute a sub-ground message to the members in the sub-group, such that the sub-group message is delivered to all of the sub-ground members in a uniform position with respect to the order of the full-group multicast messages.
The present invention will be more fully understood from the following detailed description of the preferred embodiments thereof, taken together with the drawings in which: