Publish/subscribe data processing systems have become very popular in recent years as a way of distributing data messages. Publishers are typically not concerned with where their publications are going, and subscribers are typically not interested in where the messages they receive have come from. Instead, a message broker typically assures the integrity of the message source, and manages the distribution of the message according to the valid subscriptions registered in the broker.
Publishers and subscribers may also interact with a network of brokers, each one of which propagates subscriptions and forwards publications to other brokers within the network. Therefore, when the term “broker” is used herein it should be taken as encompassing a single broker or multiple brokers working together as a network to provide brokering services.
FIG. 1 illustrates a typical publish/subscribe data processing system according to the prior art. A message broker 15 has an input mechanism 20 which may be, for example, an input queue or a synchronous input node by which messages are input when they are sent by a publisher 5; 10 to the message broker. A published message is fetched from the input mechanism by a controller 40 and processed to determine, amongst other things, to which subscribers 60; 65; 70 the message should be sent.
Message topics typically provide the key to the delivery of messages between publishers and subscribers. The broker attempts to match a topic string on a published message with a list of clients who have subscribed to receive publications including that topic string. A matching engine 30 is provided in the message broker for this very purpose. When the subscriber registers, it must typically specify a means by which it wants to receive messages (which may be a queue or other input mechanism) and a definition of the types of messages that it is interested in. A subscriber can specify that it wishes to receive messages including a topic string such as “employee/salary” and any messages matching that topic string will be identified and forwarded on to the subscriber via an output mechanism 50. (Note, there may be more than one input and output mechanism to and from which messages are received and sent by the message broker.)
It will be appreciated however that a subscriber to even a single topic may still receive a wealth of material that they are not actually interested in. It is therefore possible to further narrow the scope of a subscription by requesting only those publications having a specific content via the use of filters. For example, a subscription to the topic “employee/salary” might request only those messages published to that topic where the salary is greater than £30,000.
FIG. 2 shows the processing performed at the broker in response to registration by a subscriber in accordance with the prior art. A subscriber typically sends a message to the broker to register their subscription. This may be of the form topic=employee/salary; filter=salary>30000. This is received at the broker and is interrogated by the matching engine in order to create a parsed representation of the subscription, which can be used to determine whether any publications received at the broker match the needs of the subscriber. The parsed representation is created in a match space 100 within the matching engine and typically consists of a hierarchical tree structure comprising a number of nodes depending either directly or indirectly from a root node. As can be seen from the example topic string given, a topic string may comprise a number of topic levels. Each one is stored as a node in the tree structure, with the first level topic name, employee, depending directly from the root node and each subsequent level depending from the previous level's node. Thus in the example, a salary node depends from the employee node.
Information pertaining to the filter specified is then added into the tree structure. The filter information is typically stored in the match space separately from the main topic tree and takes the form of individual filter sub-trees Pointers then point from the topic tree to the filter sub-trees (multiple topic branches may point to the same sub-tree to aid reusability). FIG. 2, however, shows the complete picture.
In the example given, a comparison needs to be done to determine whether an employee's salary is greater than 30,000. Therefore a “comparison” node depends from the salary node. Dependent upon that is a “salary” identifier node 110 (currently having no value) and a constant node containing the value 30,000.
Identifier nodes contain attributes or properties (e.g. a “salary” attribute) and the value of such an attribute, parsed in temporarily from a publication, is used in determining whether that publication is to be forwarded onto a particular subscriber. In this regard, such a node's name is therefore compared against the fields in a publication message to look for a match. This name will typically include a part uniquely identifying the attribute (e.g. “salary”). The full name of such a node however typically depends upon the type of subscription request received. Subscriptions may arrive at the broker in a variety of different formats according to the messaging protocol used.
FIG. 3 shows two example message formats A and B. It can be seen that message A has two headers (HDR1; HDR2) and a main body, whilst message B has three headers (JHDR1; JHDR2; JHDR3) and a body. In messages of type A the attribute, X (i.e Salary), sits in the second header. In messages of type B this same attribute sits in the third header. Thus it will be appreciated that the attribute itself typically has a common description, but its location may vary between message formats. The position of the attribute is typically used as a naming protocol for the identifier node. Thus it can be seen from FIG. 2 that the subscription request is of message type A since identifier node 110 has the name HDR1.HDR2.Salary showing that the attribute Salary sits in the second header.
When a publication is received at the message broker it can be parsed against the structure shown in FIG. 2 to determine whether there are any matches. First the topic string in the publication message, is parsed through the tree structure. In this instance an employee node is found, as is a salary node. Identifier node 110 is then detected and the value from the salary data field of the publication message is input to this node and then compared against the constant node. A value of TRUE or FALSE is returned dependent upon whether this particular employee's salary is greater than 30,000. The TRUE or FALSE value is returned to a selector node which sits between the last topic node (salary) and the first filter node (comparison predicate). (Note, there is typically a selector node for each topic branch.) If a value of TRUE is returned the message is forwarded onto any subscribers who have previously specified such a filter to narrow down the scope of messages received. Typically a couple of distribution lists of subscribers are attached to the selector node and correlated with the appropriate sub-tree (each sub-tree is named—e.g. 1) such that the broker knows who to forward messages on to. Of course the distribution lists do not have to be associated with this node, indeed a selector node may not even exist. Instead the lists may, for example, be associated with the salary node (i.e. the last topic node). A table 156 lists those subscribers who have not scoped their messages with a filter, and a table 157 lists those subscribers who have specified a filter. The tables typically list subscribers by subscriber id. It is the appropriate selector node which typically makes the decision as to which subscribers to send a publication onto.
This process all works fine so long as the publication message received at the broker is of the same format as the subscription request stored at the message broker. If the formats differ then the matching engine will not find the salary attribute when doing a comparison between the identifier node 110 (e.g. HDR1.HDR2.X) and a publication message of a different format (e.g. of type B) also including the relevant attribute. This is because the attribute is located differently in the publication messages as compared with the subscription request. Thus in order to achieve a match between the identifier node in the tree structure and a message of type B, the identifier node 110 would have to be named JHDR1.JHDR2.JHDR3.X.
It will thus be appreciated that a message broker is able to cope with publications and subscriptions of the same message format. The broker is currently not however able to forward publication messages on to subscribers who have registered their subscription using a different message format.
To reiterate, this is because such filter attributes are not standard and can therefore be located in any one of a number of different places according to the message format used. Topics on the other hand are typically standard across message formats and thus in a preferred embodiment such a strict naming convention within the tree structure for these is not required to indicate attribute location. A piece of code selected according to the type of message received is able to extract any topic information such that it can be parsed into a tree-like representation within the match space.