The present invention relates generally to computer software, and more particularly, to a system and method for efficiently matching events with subscribers in a content based publish-subscribe system.
The expansion of local and wide area computer networks has pushed computer technologies to a level that must be adaptive to a distributed environment. Computer applications can be concurrently running on different nodes in a large scale network, and in this environment, a coherent multi-event management system can create synergistic results and is an essential element to the networked computers. It is known in the art that a publish-subscribe paradigm is one of simple and efficient techniques to interconnect applications in a distributed environment. Information providers (publishers) publish information in the form of events in a publish-subscribe system, which delivers these events to the information consumers (subscribers). The system acts as an intermediary between the publishers and subscribers and is typically implemented as a network broker which is responsible for routing events from publishers to subscribers. Most publish-subscribe systems support some mechanisms by which subscribers can specify what kind of events they are interested in receiving. In such systems, each event is categorized as belonging to a particular group. Subscribers can then indicate the groups to which they want to subscribe. The publish-subscribe system ensures that subscribers are notified of events belonging to their respective groups. These systems are also known as group based systems.
In addition to group based systems, there are content-based publish-subscribe systems. A content-based publish-subscribe system allows a subscriber to control which events it wishes to be notified. Events in such a system have various attributes and subscribers can specify arbitrary boolean predicates over these attributes. A subscriber is notified of an event only if the predicates specified by the subscriber are satisfied. For example, a simple event for a stock quote could possibly have two attributes: the NAME and PRICE. A subscriber could specify the following predicate (NAME=“XYZ”) AND (PRICE>20). That is, this subscriber would be notified of the related event only if the NAME attribute of the event is “XYZ” and its PRICE attribute is greater than 20. Compared to group based systems, content-based systems provide subscribers with great flexibility in choosing events for notification. A good example of a publish-subscribe system supporting content-based subscription is the Java Message Service, which is a messaging middleware standard that allows subscribers to specify SQL92 predicates over message attributes.
Knowing all the advantages that content-based publish-subscribe systems have, an important problem in designing and implementing a content-based publish-subscribe system is an event-subscriber matching problem. In a networked environment, given an event and a set of subscribers, the problem is to determine, as efficiently as possible, a subset of the subscribers that “match” with the event, i.e., those subscribers whose predetermined predicates are satisfied by the given published event.
A conventional approach would be individually testing the event against the predetermined predicates specified by each subscriber one at a time until all the predicates are tested. Such an approach is a “linear” approach and would not be scalable. A large system may have thousands of subscribers and millions of events at any moment, and the time spent to match the events with the subscribers can be significant.
Some experts in the industry suggest a solution to the matching problem, where subscriptions are organized into a matching tree, whose traversal yields a set of subscribers matching a particular event. See Marcos K. Aguilera, Robert E. Strom, Daniel C. Sturman, Mark Astley, and Tushar D. Chandra, Matching Events in a Content-Based Subscription System, Principles of Distributed Systems (1999). However in the Matching Events article, subscriptions are limited to conjunctions of atomic tests. The teaching of this article bases on the premise that any boolean predicate can be transformed into a disjunction of conjunctions. For example, a simple test                (A OR B)        can be transformed into        (A AND B) OR (A AND NOT B) OR (NOT A AND B)        
For transforming an arbitrary boolean predicate into a correct form such as the above example, the process involved can be extensive and costly in terms of time and processing capacity. Moreover, a Directed Acyclic Graph (DAG) constructed for the original test can be expanded exponentially due to the increase of tests caused by the transformation.
Furthermore, conventional binary decision diagrams and If-Then-Else DAGs primarily address the problem of finding an efficient representation for boolean expressions (including sub expressions), and they are widely used in design and verification of logic circuits. In applying these techniques for constructing DAGs, it is more a bottom-up approach and the emphasis is on sharing all possible sub expressions or low level expressions.
Although such a representation could be used to solve the matching problem, such an approach would still be linear. Moreover, sharing sub-predicates that are common prefixes is likely to result in sub-linear complexity.
What is needed is an efficient method to solve the matching problem for subscriptions, which are arbitrary boolean predicates that can make use of standard boolean operators AND, OR and NOT and parenthesis, in a content-based publish-subscribe system situated in a distributed network environment.