Within a messaging network, messages may be delivered from one data processing system to another via one or more “message brokers” that provide routing and, in many cases, transformations and other services. The brokers are typically located at communication hubs within the network, although broker functions may be implemented at various points within a distributed broker network.
Many message brokers support the publish/subscribe communication paradigm. This involves publishers sending communications that can be received by a set of subscribers who have registered their interest in receiving communications of that type, typically without the publishing application needing to know which subscribers are interested. Publish/subscribe allows subscribers to receive the latest information in an area of interest (for example, stock prices or events such as news flashes or special offers) without having to proactively and repeatedly request that information from each of the publishers.
A typical publish/subscribe environment has a number of publisher applications sending messages via a broker to a potentially large number of subscriber applications located on remote devices across the network. The publishers are decoupled from the subscribers, since communication via an intermediate broker does not require a dedicated connection between each publisher and each subscriber, which greatly simplifies the network topology compared with the tightly-coupled conventional client-server paradigm. The subscribers register with a broker and identify the categories of information for which they wish to receive published messages, and this information is stored at the broker. In many publish/subscribe implementations, subscribers specify one or more topic names which represent the information they wish to receive. When publishers send their messages to the broker, the publishers assign topic names to the messages and the broker uses a matching engine to compare the topics of received messages with its stored subscription information for the registered subscribers. This comparison determines which subscribers the message should be forwarded to. Topics are often specified hierarchically, for example using the character string format “root/level1topicName/level2topicName”, to enable topics specified within received messages to be compared with subscriptions using a matching algorithm that iteratively steps through the topic hierarchy. Subscriptions can be associated with nodes within a topic tree.
Although subscription matching often involves checking topic fields within message headers, the matching may additionally or alternatively involve checking other message header fields or checking message content, and filtering messages based on the additional information. For example, a message broker implementing the Java198 Message Service (JMS) typically allows filtering based on message properties (but not based on the application data that is the message content or ‘payload’). A message broker may perform additional functions, for example performing data content or format transformations or otherwise processing received messages before forwarding them to subscribers. (Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc.).
A commercially available example of a message broker product that supports the publish/subscribe paradigm and allows filtering by message contents is IBM Corporation's WebSphere Message Broker, as described in the documents “IBM WebSphere Message Broker Version 6 Release 0—Introduction”, IBM Corporation, July 2006, and “IBM WebSphere Message Broker Version 6 Release 0—Publish/Subscribe”, IBM Corporation, July 2006. A message broker may be associated with an underlying message delivery product that handles the complexity of providing assured message delivery over a heterogeneous network. IBM Corporation's WebSphere MQ messaging products are examples of products providing such messaging functions, and are described in a number of publications from IBM Corporation including IBM publication reference No. GC34-6590-01 “WebSphere MQ Clients”, June 2005. (IBM and WebSphere are registered trademarks of International Business Machines Corporation).
The publish/subscribe paradigm is an efficient way of disseminating information to multiple users, and is especially useful for environments in which the set of publishers and/or subscribers can change over time and where the number of publishers and/or subscribers can be large. Some subscriptions remain active only while a subscribing application is connected to the broker. These subscriptions are referred to as ‘non-durable’. Because ‘non-durable’ subscribers are likely to miss many desired publications, many other subscriptions are ‘durable’ and remain active until the subscribing application explicitly unsubscribes. Publications that match the subscription of a disconnected ‘durable’ subscriber are held at the broker for retrieval when the subscriber reconnects. When a ‘durable’ subscriber no longer wishes to receive publications, the subscriber can unsubscribe from the broker (or unsubscribe from a particular topic or set of topics).
Although this ability to subscribe and unsubscribe leaves the durable subscriber in control of which publications they receive, there is typically some latency in the performance of each subscriber-initiated subscribe and unsubscribe operation at the broker. In a communications environment that relies on low bandwidth or unreliable connections between a subscriber and a broker, the latency could result in a significant delay before a subscriber can obtain any publications. After a subscribe operation, there may also be a considerable delay before the broker receives any publications that match the new subscriber's subscription. For some subscriber applications, such delays will be acceptable; but some subscriber applications require published information as soon as possible.
Some publish/subscribe brokers delete each publication after the publication has been forwarded to the set of currently-registered subscribers. With such brokers, each subscriber only receives publications that were received by the broker after their respective subscription information is registered by the broker. However, some publish/subscribe brokers implement an optional ‘retain’ policy whereby the broker retains a copy of the last publication received by the broker for certain topics (typically retaining only one message per topic). Such retained publications may be held in cache memory or other storage at the broker. This can be useful for new subscribers who wish to quickly receive the latest publication on their topics of interest—without having to wait for a new publication to be sent by the respective publisher(s) —and for subscribers to topics for which publications are infrequent.
As an example, consider a currency converter application running on a mobile telephone or PDA. The application requires published foreign currency exchange rates to perform a currency conversion. It could be very misleading to rely on the exchange rates that were published on a different day when the currency converter application was last invoked, and so the application needs to obtain recent exchange rate information from a publish/subscribe broker. However, the user may not want to wait for the exchange rate publisher to send out their next broadcast publication. The application user may be trying to make a quick decision about whether to purchase a commodity, and waiting several minutes or even several seconds for the next publication of exchange rate information may be unacceptable. If the broker retains the most recent exchange rate publication it has received, this can be forwarded to a newly-subscribed currency converter application as soon as they subscribe, without waiting for the publisher's next publication.
In a typical implementation of retained publications, a publisher sets a retain flag and the broker responds to the retain flag by retaining the publication. The publisher may also specify an expiry time for retained publications (after which the published data may be invalid or unhelpful). The broker deletes a retained message when the expiry time is reached.
In other applications, it may be helpful for a new subscriber to receive more than just the last publication. Perhaps cumulative information is more helpful that just seeing a single publication. One solution to this problem is to use a replay server in addition to the publish/subscribe message broker. The replay server retains a large volume of data such that previously published messages are available and can be retrieved if and when required, but the associated processing and storage overheads are correspondingly large. The replay server is a separate entity from the publish/subscribe broker and so it is a non-trivial task to integrate with the subscription matching and content and format transformations of the broker.
Another potential solution is for the broker to retain a predefined number, N, of publications for each topic. However, reliance on a predefined number is inflexible and still leaves the problem of how to decide on a suitable value of N (balancing storage overheads against the benefits of retained publications to new subscribers). A predefined value, N, also does not help with the identification of which set of publications have cumulative significance and should be grouped together and which publications should be handled independently. The potential relationships between publications are disregarded by a typical publish/subscribe broker, and so grouping of publications that have cumulative significance currently relies on analysis by the subscriber applications.