Today's computer data networks span the globe and provide an ever increasing variety of information and types of information. A popular model for retrieving information is the request-response model. This is a model used, for example, by the World Wide Web: a Web client requests a Web page from a Web server and then waits until the Web server responds. This model is adequate for basic access to information, but as information consumers become more sophisticated, it quickly becomes inefficient for information consumers or information providers or both. As a general example, under the request-response model, a consumer only interested in changes to an item of information (e.g., a stock price) may be required to request the information over and over again until a change is detected in the response.
A model complimentary to request-response that is becoming increasingly popular is publish/subscribe. Under a publish/subscribe model, information consumers submit subscriptions covering events of interest to a publish/subscribe service. Then, whenever information providers publish events to the service, consumers are notified of those events to which they have subscribed. News alerts and stock quotes are classic examples of information suited to distribution via a publish/subscribe model. Examples of other applications that use a publish/subscribe model include instant messaging, online auctions and electronic commerce price databases.
In addition, new applications are emerging where software agents play the role of information consumer, for example, in communicating with sensors and devices to perform automation tasks, and in monitoring and executing routine business-to-business transactions. Software agents present additional scalability challenges to the design of a publish/subscribe system because they are able to significantly increase the total number of subscribers, they are able to handle very complex subscriptions, and they are able to receive and process notifications at a very high rate.
Early publish/subscribe systems used a flat channel subscription model. Information consumers subscribed to a named channel and received only events that an information provider published to that particular channel. An improvement over the flat channel subscription model is to arrange the channels into a hierarchy of topics and subtopics so that a subscriber to a topic receives any events published to the topic and any of its subtopics. Modem publish/subscribe systems are able to allow even more fine-grained selection of events by enabling subscriptions to events based on the content of an event.
A content-based publish/subscribe system specifies an event schema for a topic, which lists the names and types of attributes that appear in an event. A subscription filter associated with a subscription may then be specified as a conjunction of predicates on a subset of those attributes. For example, a “stock quotes” topic specifies an event schema with three attributes: Symbol, Price, and Volume. An example event is (Symbol=MSFT and Price=79.30 and Volume=40,000,000); an example subscription filter is (Symbol=MSFT and Price>80.00). In a further example, the topic is itself an attribute of the event schema, e.g., Topic, Symbol, Price and Volume, so that subscribing to a topic and/or subtopic is then an aspect of the more general content-based subscription mechanism.
A new content-based publish/subscribe service will typically begin with a single physical server that receives and stores subscriptions from each service subscriber, receives events from each service publisher, performs matching of each event against the subscriptions, and sends notifications to subscribers with matching subscriptions. However, a successful service will eventually require performance beyond the capabilities of a single physical server. For such a service, a network of physical servers and/or a distributed system architecture is required.
Some prior art systems have incorporated a network of physical servers by propagating each event published to the service to each of the physical servers in the network, but this technique has inherent inefficiencies. Some prior art systems have achieved better efficiency by using a precise subscription filter summary. In such systems, each physical server that hosts subscriptions calculates a precise summary of the subscription filters associated with the subscriptions. The precise filter summary is then propagated against the flow of events and used by upstream event routers to block unnecessary event traffic as early in the route as possible.
There are problems with prior art systems that use precise subscription filter summaries. One problem is that in practice a precise subscription filter summary becomes so complex that event routers become a system bottleneck, degrading overall system throughput. Another problem is that subscription filters associated with subscriptions hosted by a server sometimes have poor locality. When that is the case, a summary of the subscription filters is too broad to be effective in reducing event traffic.
To ensure continuing success for content-based publish/subscribe services, there is a need in the art to solve such problems.