In a computer network, plural computer systems are joined together to exchange information and share resources. Thus, a computer network is a distributed computing environment in which networked computer systems provide user with the capabilities of access to distributed resources, for example, remote files and databases or remote printers, and to distributed processing, for example, where an application is processed on two or more computer systems. In such a distributed computing environment, the components of an application may reside on different machines but work together. For example, each work station in a computer network often provides a user interface and local analysis and data processing, while larger, host computers, for example, a file server or mainframe, may maintain a large set of data files, coordinate access to large databases and perform larger scale data processing. In another distributed computing environment, an instance of an application is distributed to plural computers within the network. Such a network can more reliably perform a requested task by having a second instance of an application available to perform the task in the event that a first instance of the application is unavailable. Processes by which tasks are reassigned amongst plural instances of an application based upon availability of the instances shall hereafter be termed as “failover” processes. Similarly, by distributing requested tasks among plural instances of an application, any particular instance of the application will be less vulnerable to overloads. Processes by which tasks are distributed amongst plural instances of an application shall hereafter be termed as “load balancing” processes.
In distributed processing environments such as these, each application or process must be able to communicate and exchange information with other applications or processes in the environment. Currently, many inter-application or inter-process exchanges are performed using a messaging technique commonly referred to as message queuing. In message queuing, a first (or “client”) process passes a message to request processing by a second (or “server”) process. The messages are queued at the server process by a queue manager to await handling by the server process. In turn, the server process returns an alert when the results are ready. One message oriented middleware product which uses a message queuing messaging technique to enable processes to communicate and exchange information in a distributed computing environment is known as MQ Series messaging software and is commercially available through International Business Machines Corporation of Armonk, N.Y.
The MQ series provides certain functionality which may be used in support of failover and/or load balancing processes. More specifically, the MQ series supports these processes by the use of clustered queues. In an MQ series cluster, both a queue manager and an instance of the server process reside on each one of plural computer platforms. To enable clustering, however, the MQ series relies upon one or more repository queue managers that contain information about all queue managers and their components. Repository queue managers periodically send updates to each other to stay in synchronization. Typically, two or more, but not all, queue managers in a cluster are repository queue managers. The other queue managers in the cluster receive the information they need from the repository queue managers. Each of the queue managers in the cluster define a common transmission queue and common receiver channel. As a result, a message may be handled by any one of the queue managers.
When a cluster contains more than one instance of the same queue, MQ Series uses a workload management algorithm to determine the best queue manager to route a message to. The workload management algorithm selects the local queue manager as the destination wherever possible. If there is no instance of the queue on the local queue manager, the algorithm determines which destinations are suitable. Suitability is based on the state of the channel, including any priority assigned to the channel, and also the availability of the queue manager and queue. The algorithm then uses a round-robin approach to finalize its choice between the suitable queue managers.
It should be readily appreciated that, in its current implementation, MQ Series clustering has a number of limitations when used to support failover and/or load balancing in a distributed processing environment. For example, requiring plural repository queue managers which periodically update one another for implementing clustering would likely add considerable overhead to the delivery of messages within the distributed processing environment. Similarly, round-robin approaches are overhead intensive. Additionally, the workload management algorithm used in MQ Series clustering focuses on the queue itself and does not give sufficient weight to other considerations. For example, while MQ Series clustering will determine if a queue manager is unavailable, it while not determine if the instance of the application taking the message off the queue is working.
Accordingly, this invention seeks to provide load balancing and/or failover capabilities within distributed processing environments which use MQ Series or other messaging services for exchanges between platforms while avoiding the various shortcomings of current MQ Series functionality which supports load balancing and/or failover. In particular, the invention discloses load balancing and failover processes suitable for use in distributed processing environments without significant effect on overhead.