A “publish/subscribe” communication system is a type of messaging application in which the, providers of information (publishers) are decoupled from the consumers of that information (subscribers) by means of an intermediate broker or other system component that implements subscription matching to identify information that is of interest to a particular subscriber. Subscriptions may specify topic names of interest, or may specify which information content is of interest. Typically, in a topic-based publish/subscribe messaging system, a number of publishers publish messages to a message broker on particular topics (e.g. news, weather, sport). Subscribers register their interest in such topics via subscription requests received at the broker. For example, a subscriber may be an application program or system that requests all request to receive any information published on the topic *weather’, whilst another subscriber may desire information on the topics ‘news’ and ‘sport’. Publishers do not need to be concerned with where their publications are going, and subscribers do not need to know where the messages they receive have come from. Instead, the broker manages the distribution of the messages to make sure that they arrive at the correct destination according to the valid subscriptions registered in the broker. The broker also ensures that messages are distributed in the correct format, and validates the authority of each publisher to publish to the subscribers which have subscribed to the particular topic encompassing the message.
In general terms, a publisher generates a message that it wants to publish and defines the topic of the message. The broker retrieves the message from its input node and passes it to a publication node for distribution to all subscribers that have registered an interest. Distribution of messages to subscribers may either be by point-to-point broadcast from the broker or may be by multi-casting. In the latter case, in order to reduce network traffic, messages on particular topics may be distributed to intermediate shared addresses which are provided to all the subscribers that have registered an interest in the particular topic so that they can listen in for newly published information.
Another approach to publish/subscribe communications employs a client-side subscription-matching component. That is, all publications from approved publishers are transmitted to each system running one or more subscriber applications. A component associated with the local subscriber application(s) determines which publications to delete (because they are of no interest to local subscribers) and which to pass to the local subscriber application(s).
In order to facilitate the identification of topics of interest with greater precision, specific topic syntaxes have been developed which are multi level and which permit the use of wildcards. In this way carefully defined sets of related topics can be covered by a single subscription.
One known publish/subscribe system of this type is further described in a document entitled “Publish/Subscribe” (Third Edition, February 2005) in the documentation library of the product WebSphere Business Integration Message Broker V5.0 from International Business Machines Corporation (“WebSphere” is a trademark of International Business Machines Corporation).
In this document, it is explained that a “topic” may be any character string that describes the nature of the data that is published in a publish/subscribe system. Topics are key to the successful delivery of messages. Instead of including a specific destination address in each message, a publisher assigns topics to the message. The broker matches the topic with a list of clients (subscribers) who have subscribed to that topic and delivers the message to each of those clients. Topics can be defined by a system administrator in advance but can also be defined when specified in a publication for the first time.
Each topic defined becomes an element, or node, in a topic tree. The resulting tree is usually a hierarchical (multi-level) structure with one or more root topics. The nodes are identified by name and are combinable to define a narrower topic by specifying the names of nodes on successive levels of the hierarchy. The levels may be separated by the slash “/” character.
In the syntax employed in some products, publish/subscribe topics are thus identified by any character strings, separated by slashes. In addition to the slash “/”, special meaning also applies to the plus “+” and the hash “#” (also referred to as the pound sign in the US), which signify different types of wildcards. These special characters will now be discussed in more detail with the use of examples.
The slash character (“/”) denotes partitions within a topic name which are interpreted as levels in a tree, as explained above. For example “employee/hire/development” is a topic name with three levels indicating only information about employees hired within the development function. The slashes are used to define a hierarchy in the topic namespace. There is no limit to the number of levels in a topic tree and there may be any number of root nodes (that is, any number of topic trees).
For greater flexibility, the hash character (“#”), is defined as a wildcard character which can match any number of partitions. Although some implementations allow use of the hash character only at the beginning or the end of a topic, this rule may not apply in other cases. Thus a subscription to “employee/#” will receive all messages with the subjects “employee/hire” and “employee/hire/development”. Because of this, the hash wildcard is called the multi-level wildcard. Since the semantics of the # wildcard are that it can match zero or more partitions, “employee/#” can also match just “employee” (but in this case, the slash is meaningless, since there is no partition to separate). Typically, the multi-level wildcard is used to match a sub-tree of unknown depth. By preceding “employee” with “#/”, that is “#/employee” other multi-level topic strings which happen to contain bottom level references to “employee”, such as “development/employee” will match.
The second type of wildcard is the plus “+”. It is called the single level wildcard since it will only match a single partition. For example, “employee/+” will match “employee/hire” but not “employee/hire/development”. Nor does it match “employee” alone as there must be a second level name in the topic.
In the above-described syntax, sets of topics can only be defined with the use of one or more wildcards of either type. In the absence of wildcards, different topics are specific and non-overlapping so that, using the examples above, the topic “employee” does not include the topic “employee/hire” and only contains items with a single top level reference to “employee”. Similarly, the second level topic “employee/hire” is distinct from the third level topic “employee/hire/development”. The topics do not overlap and thus are not subsets or supersets of each other.
The use of wildcards in topic definitions is restricted to subscribers. Publishers can only publish information (a “publication”) on discrete topics, which must be identified to the broker in a publish command also containing the publication itself. Subscribers, by contrast, can send subscription requests to a broker using topic sets defined by means of wildcards. As used hereafter, the term “topic set” will refer to a superset of any mixture of discrete topics and other topic sets.
Another optional feature available to subscribers is the filter. A filter is an expression, which might also include wildcards, that is applied to the content (as opposed to the topic definition) of a publication message to determine whether it matches the subscription. When a subscription is registered with the broker, in addition to specifying a topic and destination, a filter may be specified to further refine the selection of publications according to their contents. It is even possible to select publications using only filters by specifying # alone (equivalent to “all topics”) in the topic field. However, this may result in excessive network traffic as all messages arrive at the broker.
Another aspect of publish/subscribe is that subscribers must be free to alter their subscriptions and so a deregistration, or “unsubscribe”, request function is provided. Conventionally, this is only permitted to remove a topic from the subscription list of a subscriber if the unsubscribed topic is identical with one in the list. This keeps the list as a wholly positive list of topics of interest which can easily be tested for matches with subsequently applied publication topics. If the unsubscribe request is not identical to a listed topic for that client, it is ignored. Subscription lists can become quite long and a query facility, even if provided, requires substantial operator involvement. For these reasons, managing the list to ensure that only information of current interest is being subscribed to can become a problem.
This situation is particularly a problem where the topic set of interest is a high level one with many potential subtopics such as may be defined using wildcards, particularly multi level wildcards. This is because attempting to unsubscribe to anything less than the complete topic set will fail. This lack of flexibility is a hindrance to efficient and targeted use of publish/subscribe techniques as it would require the subscriber to re-subscribe to a large and ill defined number of lower level topics. Although the use of filters does allow subscribers to further restrict messages received, this is effected only by applying a structured query to the actual message content, involving additional computation. Substantial operator involvement in the additional query process is again required. Also, as has already been noted, the broad use of wildcards and reliance on filters can still result in excess network traffic.
Publish/subscribe communications have proven well suited to message-oriented middleware products and messaging environments in which a subscription matcher component determines which published messages should be passed to specific subscribers. As mentioned above, the subscription matcher may be local to each subscriber, or may be a message broker or network of brokers located at an intermediate node or set of nodes in a network—between publishers and subscribers. Publish/subscribe solutions are also achieving increasing acceptance for Web Services notifications.
It is recognized that it is desirable to be able to effectively exclude, at the subscription/unsubscription stage, a portion of a broadly defined set of topics without removing the broad definition itself. In addition to this particular problem of broad topic set definition, it is also generally desirable that arbitrary unsubscribing and indeed subscribing to additional topics should work more efficiently.