Publish/subscribe data processing systems have become very popular in recent years as a way of distributing data messages from publishing computers to subscribing computers. The increasing popularity of the Internet, which has connected a wide variety of computers all over the world, has helped to make such publish/subscribe systems even more popular. Using the Internet, a World Wide Web browser application (the term "application" or "process" refers to a software program, or portion thereof, running on a computer) can be used in conjunction with the publisher or subscriber in order to graphically display messages. Such systems are especially useful where data supplied by a publisher is constantly changing and a large number of subscribers needs to be quickly updated with the latest data. Perhaps the best example of where this is useful is in the distribution of stock market data.
In such systems, publisher applications of data messages do not need to know the identity or location of the subscriber applications which will receive the messages. The publishers need only connect to a publish/subscribe distribution agent process, which is included in a group of such processes making up a broker network, and send messages to the distribution agent process, specifying the subject of the message to the distribution agent process. The distribution agent process then distributes the published messages to subscriber applications which have previously indicated to the broker network that they would like to receive data messages on particular subjects. Thus, the subscribers also do not need to know the identity or location of the publishers. The subscribers need only connect to a distribution agent process.
One such publish/subscribe system which is currently in use, and which has been developed by the Transarc Corp. (a wholly owned subsidiary of the assignee of the present patent application, IBM Corp.) is shown in FIG. 1. Publishers 11 and 12 connect to the publish/subscribe broker network 2 and send published messages to broker network 2 which distributes the messages to subscribers 31, 32, 33, 34. Publishers 11 and 12, which are data processing applications which output data messages, connect to broker network 2 using the well known inter-application data connection protocol known as remote procedure call (or RPC). Each publisher application could be running on a separate machine, alternatively, a single machine could be running a plurality of publisher applications. The broker network 2 is made up of a plurality of distribution agents (21 through 27) which are connected in a hierarchial fashion which will be described below as a "tree structure". These distribution agents, each of which could be running on a separate machine, are data processing applications which distribute data messages through the broker network 2 from publishers to subscribers. Subscriber applications 31, 32, 33 and 34 connect to the broker network 2 via RPC in order to receive published messages.
Publishers 11 and 12 first connect via RPC directly to a root distribution agent 21 which in turn connects via RPC to second level distribution agents 22 and 23 which in turn connect via RPC to third level distribution agents 24, 25, 26 and 27 (also known as "leaf distribution agents" since they are the final distribution agents in the tree structure). Each distribution agent could be running on its own machine, or alternatively, groups of distribution agents could be running on the same machine. The leaf distribution agents connect via RPC to subscriber applications 31 through 34, each of which could be running on its own machine.
In order to allow the broker network 2 to determine which published messages should be sent to which subscribers, publishers provide the root distribution agent 21 with the name of a distribution stream for each published message. A distribution stream (called hereinafter a "stream") is an ordered sequence of messages having a name (e.g., "stock" for a stream of stock market quotes) to distinguish the stream from other streams. Likewise, subscribers provide the leaf distribution agents 31 through 34 with the name of the streams to which they would like to subscribe. In this way, the broker network 2 keeps track of which subscribers are interested in which streams so that when publishers publish messages to such streams, the messages can be distributed to the corresponding subscribers. Subscribers are also allowed to provide filter expressions to the broker network in order to limit the messages which will be received on a particular stream (e.g., a subscriber 31 interested in only IBM stock quotes could subscribe to the stream "stock" by making an RPC call to leaf distribution agent 24 and include a filter expression stating that only messages on the "stock" stream relating to IBM stock should be sent to subscriber 31).
The above-described publish/subscribe architecture provides the advantage of central coordination of all published messages, since all publishers must connect to the same distribution agent (the root) in order to publish a message to the broker network. For example, total ordering of published messages throughout the broker network is greatly facilitated, since the root can easily assign sequence numbers to each published message on a stream. However, this architecture also has the disadvantage of publisher inflexibility, since each publisher is constrained to publishing from the single root distribution agent, even when it would be much easier for a publisher to connect to a closer distribution agent.
Accordingly, publish/subscribe software designers are beginning to consider architectures where publishers are allowed to publish messages directly to any distribution agent in the broker network. This clearly has the advantage of removing the above-mentioned constraint on publishers. However, as with any tradeoff, it presents other problems. One of the major problems is that a subscriber application is not given any assurance (guarantee) that the subscriber application will receive publications from all publisher applications that might publish to any possible broker (since a publisher application can publish to any broker in this type of architecture). This is because the subscriber is not visible to all potential publishers until the subscription has reached all brokers where there is a potential publisher.
Specifically, when a subscriber application located in London registers a new subscription, it communicates directly with its local broker (also referred to herein as a distribution agent) also located in London and registers its subscription to a given topic with the local broker. The local broker then returns an acknowledgement to inform the subscriber that the subscription has been received. The local broker then passes along the subscription to other brokers (e.g., in New York and Hong Kong) in the broker network so that no matter which broker a publisher communicates directly with, published messages from such a publisher will be distributed to the subscriber. However, the subscriber can not be sure that it will receive published messages from publishers no matter which broker a publisher happens to connect to. Instead, the subscriber can only be assured that it will receive published messages from a publisher that communicates directly with the same local broker that the subscriber has registered its subscription with.
For example, the subscriber in London can not be sure that his subscription has been sent to the broker in Hong Kong and if, for some reason, the subscription data only reaches New York and not Hong Kong, the subscriber in London will only receive messages from London and New York but not from Hong Kong, and the London subscriber will have no idea that it is not receiving any messages from Hong Kong (because oftentimes the content of the messages doe not provide an indication of where the published message originated from).
As the information being exchanged in such a publish/subscribe broker network is often of critical importance, this lack of certainty can be a big problem which inhibits the wide scale deployment of such broker network systems. That is, given this problem, people may opt for using a more direct means of disseminating messages (where information consumers connect directly to information providers) rather than use an intermediary broker, but this would not make possible the many advantages made possible via a broker network.
There is thus a great need in the publish/subscribe broker network art for a way of providing subscribers with a higher degree of certainty that they will receive published messages irrespective of the location of the broker from which these published messages have entered the broker network.