It is increasingly common to distribute a data processing operation over a plurality of data processing units, with each of the units communicating over a data communications network such as the Internet. One reason for this is that a particular data processing unit may be able to do a job better than another, so a first unit makes a request for a second unit to do a particular job and then to return the result to the first unit.
It is also very common to have a large number of intermediate data processing units (also known as “nodes”) between the originating unit that makes a request and the destination unit that is being asked to do the work. Each intermediate unit receives the request, performs some initial processing to determine what to do with it, and then forwards it on to the next unit.
A popular mechanism for carrying out such distributed data processing is called asynchronous message queuing, where applications communicate with each other by sending messages to queues, which can then be accessed by the receiving application at a time that is convenient for the receiving application. IBM's WebSphere MQ (trademark) software product, which has been on the market for a number of years, is the most popular example of this type of software.
Frequently, messages which flow between data processing units in an asynchronous message queuing network are considered “high value” messages. For such messages, it is very important for the originating data processing unit to be able to locate such messages, should they become lost on their way to the destination unit. The term “lost” is taken to mean that the message is safe, but its location is unknown. A message could be lost, for example, if a link is broken between units, or if the target messaging address is not known on one of the intermediate nodes, or the message is waiting to be processed by an application but the application is currently not available or is running with bad performance. In such situations, the message will be unable to advance towards the destination unit in a reasonable time until either the link is repaired, or the routing (i.e., address resolution) configuration on the node in error is corrected, or the performance bottleneck is removed, or the failing applications are restarted.
The only way known in the prior art to locate such lost messages is to have an operator “visit” (either physically or electronically) each of the nodes of the messaging network and search through the various message queues (e.g., the dead letter queues (DLQs) and the transmission queues (TXQs)). However, this is obviously very time consuming and inefficient. Various prior art teachings have employed a test message which is sent by an originating data processing unit into the network of intermediate units on its way to a destination unit. The test message arrives at various intermediate units in the network on its way to the destination unit, and reports are sent back to the originating unit by each intermediate unit to report the exact path that the test message took on its way through the network. For example, the well known Advanced Peer to Peer Networking (APPN) and TCP/IP (Transmission Control Protocol/Internet Protocol) provide such functionality. U.S. Pat. No. 5,668,800 is another example of such prior art. However, such prior art identifies the path of a test message but does not locate the lost (application) message. Such prior art provides an operator with a possible (but not guaranteed) route that a lost message might have taken.
U.S. Pat. No. 6,654,805 describes an asynchronous message queuing network. A data processing method of finding a lost message includes the steps of: sending a first message from an originating data processing unit to an intermediate data processing unit, the message including an indication that the message is traceable; sending a tracer message from an originating data processing unit to an intermediate data processing unit, the tracer message identifying the first message as a lost message which the originating data processing unit would like to find; at the intermediate data processing unit, upon receiving the tracer message, checking to determine whether the first message exists within the intermediate data processing unit, and sending a reply message back to the originating unit if the first message is found within the intermediate data processing unit; and at the intermediate data processing unit, if the first message is not found within the intermediate data processing unit, determining whether the first message has passed through the intermediate data processing unit, and if the first message has passed through the intermediate data processing unit, determining a neighboring data processing unit which received the first message from the intermediate data processing unit, and forwarding the tracer message to the neighboring data processing unit.
All these prior art methods work only on the message identifiers of the messages, which in most cases are unknown to the business domain users, rather than on the message contents.