Publish/subscribe data processing systems have become very popular in recent years as a way of distributing data messages. Publishers are not concerned with where their publications are going, and subscribers are not interested in where the messages they receive have come from. Instead, a message broker typically assures the integrity of the message source, and manages the distribution of the message according to the valid subscriptions registered in the broker.
Publishers and subscribers may also interact with a network of brokers, each one of which propagates subscriptions and forwards publications to other brokers within the network. Therefore, when the term “broker” is used herein it should be taken as encompassing a single broker or multiple brokers working together as a network to act as a single broker.
FIG. 1 illustrates a typical publish/subscribe data processing system according to the prior art. A message broker 15 has an input mechanism 20 which may be, for example, an input queue or a synchronous input node by which messages are input when they are sent by a publisher 5; 10 to the message broker. A published message is fetched from the input mechanism by a controller 40 and processed to determine, amongst other things, to which subscribers 60; 65; 70 the message should be sent.
Message topics typically provide the key to the delivery of messages between publishers and subscribers. The broker attempts to match a topic string on a published message with a list of clients who have subscribed to receive publications including that topic string. A matching engine 30 is provided in the message broker for this very purpose. When the subscriber registers, it must typically specify a means by which it wants to receive messages (which may be a queue or other input mechanism) and a definition of the types of messages that it is interested in. A subscriber can specify that it wishes to receive messages including a topic string such as “employee/salary” and any messages matching that topic string will be identified and forwarded on to the subscriber via an output mechanism 50. (Note, there may be more than one input and output mechanism to and from which messages are received and sent by the message broker.)
It will be appreciated however that a subscriber to even a single topic may still receive a wealth of material that they are not actually interested in. It is therefore possible to further narrow the scope of a subscription by requesting only those publications having a specific content via the use of filters. For example, a subscription to the topic “employee/salary” might request only those messages published to that topic where the sum of the salary AND the bonus is equal to or greater than £30,000.
FIG. 2 shows the processing performed at the broker in response to registration by a subscriber in accordance with the prior art. A subscriber typically sends a message to the broker to register their subscription. This may be of the form topic=employee/salary; filter=salary+bonus>30000. This is received at the broker and is interrogated by the matching engine in order to create a parsed representation of the subscription, which can be used to determine whether any publications received at the broker match the needs of the subscriber. The parsed representation is created in a match space 150 within the matching engine and typically consists of a hierarchical tree structure comprising a number of nodes depending either directly or indirectly from a root node. As can be seen from the example topic string given, a topic string may comprise a number of topic levels. Each one is stored as a node in the tree structure, with the first level topic name, employee, depending directly from the root node and each subsequent level depending from the previous level's node. Thus in the example, a salary node depends from the employee node. Information pertaining to the filter specified is then added into the tree structure. In the example given, a comparison needs to be done to determine whether an employee's salary plus their bonus equals an amount greater than 30,000. Therefore a “comparison” node depends from the salary node. Dependent upon that node is an “addition” node for carrying out an addition between two identifier nodes: salary; and bonus (both currently having no value). The addition is to be compared against a “constant” node containing the value 30,000.
Note, the filter information is typically stored in the match space separately from the main topic tree and takes the form of individual filter sub-trees Pointers then point from the topic tree to the filter sub-trees (multiple topic branches may point to the same sub-tree to aid reusability). FIG. 2, however, shows the complete picture.
When a publication is received at the message broker it can be parsed against the structure shown in FIG. 2 to determine whether there are any matches. FIG. 3 shows an exemplary format of a publication message according to the prior art. Such a message 120 consists of a header 100 and a payload 110. The header provides the information that the receiving system needs to know about the message including delivery details and message parameters and typically includes a description of the topic of the message as shown in the figure. The payload carries the actual data of the message and in this example contains an employee id; employee salary; and employee bonus.
Referring back to FIG. 2, the topic string in the publication message 120, is parsed through the tree structure to look for any matches. In this instance an employee node is found, as is a salary node (i.e. these correspond to the employee and salary data fields in payload 110). The values of these data fields are then input to these two nodes; added together; and then compared against the constant node. A value of TRUE or FALSE is returned dependent upon whether the total of this particular employee's salary and bonus is greater than 30,000. Since the addition comes to 31,000, a value of TRUE is returned to a selector node which sits between the last topic node (salary) and the first filter node (comparison predicate). (Note, there is typically a selector node for each topic branch.) The messages is therefore forwarded on to any subscribers who have previously specified such a filter to narrow down the scope of messages received. Typically a couple of distribution lists of subscribers are attached to the selector node and correlated with the appropriate sub-tree (each sub-tree is named—e.g. 1) such that the broker knows who to forward messages on to. Of course the distribution lists do not have to be associated with this node, indeed a selector node may not even exist. Instead the lists may, for example, be associated with the salary node (i.e. the last topic node). A table 156 lists those subscribers who have not scoped their messages with a filter, and a table 157 lists those subscribers who have specified a filter. The tables typically list subscribers by subscriber id. It is the appropriate selector node which typically makes the decision as to which subscribers to send a publication onto.
Thus as previously mentioned in order for matching to be possible at publication time, each time a subscription is registered it is parsed into the hierarchical tree structure and each time a subscription is unregistered, it has to be removed from the tree structure. The number of subscriptions registered can be large and may be volatile. Consequently this is not a trivial task. Furthermore even when a new subscription is identical to a previously registered subscription (or even just the filter part of that subscription is identical), the subscription still has to be parsed though the existing tree structure to determine this, even if a part of the tree structure does not have to be created because it already exists. This is processor intensive and unnecessary.
There is a further related problem with prior art systems. In a typical publish/subscribe system, publications are received at the broker and forwarded straight on to their listed subscribers. No publications are actually stored at the broker. Thus if a subscriber subscribes to a particular topic in the hierarchy, they will only receive those publications which are sent to the broker subsequent to their registration. All publications received at the broker prior to a subscriber's registration are never seen by the subscriber.
To overcome this problem, retained publication systems are becoming more prevalent. In such a system, publications are stored at the broker. Every time a subscription is received at the broker, a parsed representation of the subscription (i.e. specified topic string and optionally filter expression) is created, if it does not exist already, and then the set of retained publications are searched to see whether any of them yield a match for the newly registered subscription. Any matches are then forwarded on to the subscriber, as are any new publications received at the broker subsequent to registration. The creation of a parsed representation and then evaluation of the retained publications to look for matches is also time consuming and processor intensive.