Increasingly, business applications such as ordering systems that are implemented using computer hardware and computer software are required to have high availability and reliability. Many businesses demand that their data processing systems are operational 24 hours of every day and never lose data, and the best information technology companies have responded to those demands (in some cases achieving availability of data processing systems above 99.99%). Businesses typically also want high performance (high throughput without loss of reliability), which requires scalable solutions as data processing requirements increase, and they do not want high costs.
Highly available data processing systems have been developed using a combination of redundancy (of storage, processors and network connections) and recovery features (backup and failover) to avoid any single point of failure. One such solution includes a high availability database (HADB) that is distributed across a tightly integrated cluster of servers using redundant storage arrangements, such as a cluster of highly reliable IBM mainframe computers that act together as a single system image. Clusters of processors that combine data sharing with parallel processing to achieve high performance and high availability are sometimes referred to as a parallel systems complex or ‘parallel sysplex’. A typical HADB implemented in a parallel sysplex can handle multiple parallel requests for data retrieval and data updates from a large number of distributed requesters with high performance and reliability.
The HADB system can include a robust, high availability message processing system that combines message queues with business logic and routing functions to manage data access. This can provide assured once-only message delivery, and such systems can handle failures efficiently with reduced delays. However, the transaction management, redundancy management and recovery features that are typically implemented within such a high availability system incur significant processing overheads during normal request processing. Any such processing has potential business costs—because high availability data processing systems are more expensive than less reliable systems. An example of this additional processing is a requirement for two phase commit processing within the HADB system or, more particularly, two phase commit processing between resources within the HADB system and resources outside the system. Also, implementing message queues within the HADB system typically requires logging of the message data within the HADB.
An alternative solution is to employ a cluster of parallel message dispatchers that are separate from the HADB system, such as in a conventional application server cluster in which each server does not implement comprehensive high availability features. Parallel processing can improve throughput and reduce the impact of failures compared with a single message dispatcher, and separating the message dispatcher functions from the HADB system can reduce processing overheads. However, if the message dispatchers run on servers without high availability features, a failure which affects one server will delay the processing of the messages that have been sent to that server. This can be problematic despite the possibility of other messages being successfully processed by other message dispatchers in the meantime. The messages sent to a failed message dispatcher (referred to herein as ‘orphan messages’ or ‘orphan requests’) are typically delayed until that message dispatcher comes back on-line.
Some known clustered messaging systems implement a number of features for fast recovery following a node failure, to reduce delays in the processing of orphan messages, but such approaches have not as yet fully solved the problem of delayed processing of orphan messages.