Many existing messaging systems use a single messaging manager to manage the transmission, from a local system, of all messages which are destined for remote systems, and to handle receipt of all messages which are destined for the local system. An application program running on the local system which requires that a message be sent to a remote system connects to the local messaging manager and requests that it send the message to the required destination. This implies reliance on the availability of the single messaging manager for all communications. Any failure which affects that messaging manager has a significant effect on messaging throughput, since a full rollback and restart of the messaging manager is required before communications can resume.
It is known from U.S. Pat. Nos. 5,797,005 and 5,887,168 to provide a system allowing messages to be processed by any of a plurality of data processing systems in a data processing environment. A shared queue is provided to store incoming messages for processing by one of the plurality of data processing systems. A common queue server receives and queues the messages onto the shared queue so that they can be retrieved by a system having available capacity to process the messages. A system having available capacity retrieves the queued message, performs the necessary processing and places an appropriate response message back on the shared queue. Thus, the shared queue stores messages sent in either direction between clients requesting processing and the data processing systems that perform the processing. Because the messages are enqueued onto the shared queue, the messages can be processed by an application running on any of a plurality of systems having access to the queue. Automatic workload sharing and processing redundancy is provided by this arrangement. If a particular application that is processing a message fails, another application can retrieve that message from the shared queue and perform the processing without the client having to wait for the original application to be restarted.
U.S. patent application Ser. No. 60/220,685 (attorney reference GB9-2000-032), which is commonly assigned to the present application and is incorporated herein by reference, discloses improved recovery from connection failures between a queuing subsystem and a shared queue, such failure being caused either by communications link failure, or failure of the queuing subsystem. Message data in a shared queue is communicated between message queuing subsystems by means of data structures contained in a coupling facility. A connection failure to the coupling facility is notified to queuing subsystems other than the one which experienced the failure, and these queuing subsystems then share between them the recovery of active units of work of the failed subsystem.
Although the solution of U.S. Ser. No. 60/220,685 provides significantly improved transactional recovery within a group of queuing subsystems, it does not address the problems of how to resume communications with communication managers outside the group in the event of failures affecting in-progress communications.