Large distributed applications are complex to manage. In particular, a significant amount of manual intervention is required to deal with the unreliable nature of distributed communications. This problem is exacerbated for very large scale applications that may run millions of processes that exchange many millions of messages. Traditionally, these large scale applications require human intervention from system administrators to manually compensate and correct failed processes. Additionally, these applications require an extensive infrastructure that support this intervention, such as dead-letter queues and suspended process reporting.
Problems may arise, for example, when two applications, such as two components of a business process, exchange messages. The applications may exchange messages through a broker. A sending instance may require a response to a message that is sent to a destination instance. However, if the destination instance ends before receiving the message, the message will sit in the broker. As a result, the sending instance never receives a response and, therefore, becomes stalled. When a large distributed application is running millions of processes, thousands of unroutable messages may collect in the broker. A system administrator must manually clean up these unroutable messages and restart stalled processes. For very large distributed applications, the scale becomes too large for human operators to manage these unrouted messages, stalled processes and similar issues.
These problems are not just an issue for large applications, but also arise in small applications. Additionally, these problems arise when using brokered messaging and when using correlated, one-way point-to-point messaging. Other high-scale scenarios, such as a large number of small hosted applications, may also need to address these issues.