1. Technical Field
The present invention relates to event processing systems and methods and, more particularly, to content-based routing techniques for publish-subscribe or stream processing systems.
2. Description of the Related Art
Data is increasingly being generated digitally from data sources such as sensors, satellites, audio and video channels, and stock feeds. Data from such systems are typically communicated as the data are generated, i.e., as a data stream or message stream. There is a growing need for extracting information on a continuous basis from these streams to look for abnormal activity and other interesting phenomena.
Publish-subscribe (pub-sub) systems provide mechanisms to route messages to interested consumers. A key aspect to lessen the burden on the processing and communications infrastructure lies in the content-based routing system, which enables consumers to specify (i.e., subscribe to) those messages the consumers wish to receive.
In traditional pub-sub systems such as the one described in U.S. Pat. No. 5,557,798, messages are published to a channel name. Subscriptions may be expressed using channel names or by employing publisher-defined or system-defined attributes. For example, the JMS Specification (Version 1.0.2b Aug. 27, 2001) of SUN MICROSYSTEMS™ describes message attributes (or properties, in their lingo) as follows: (1) application-specific properties, (2) standard properties (i.e., optional header fields), and (3) provider-specific properties. In all cases, either the publisher or the infrastructure defines these properties and the properties are transmitted as part of the message. The subscriber defines a JMS message selector (or expression) specifying the messages it is interested in based on the message header properties. A message broker is responsible for accepting subscriptions from consumers and messages from producers, and for inspecting the message properties to determine to which consumers the message should be routed.
There has also been work on mediators in pub-sub and messaging systems such as the IBM® Websphere® Application Server v 6.0. A mediator is a piece of code which is always associated with a destination or a subscriber. Mediation code operates on a message as it traverses that destination. The two main functions of mediators are: (1) Transforming the message data from one message content format to another. This is especially important if the sender and the receiver of a message do not support exactly the same message format. A mediator can be written to perform the necessary transformation using, for example, an XSLT stylesheet. (2) Making routing decisions. A mediator can read the content of a message and, based on this content, route the message to different destinations.
Referring to FIGS. 1A-1E, consider the case of three represented message producers (p1, p2, p3). In this scenario, each producer (p1, p2, p3) is a sensor monitoring an entity, publishing messages to a topic (t1, t2, t3). Alternatively, p1-p3 could publish to a single topic and identify the producer via a message property. In this example, the t1-t3 scenario will be used. For the consumer (c1), the streams are equivalent (i.e., report on the same entity and vary only by the “quality” of measurements, where quality is some application defined metric, such as signal to noise ratio), and thus c1 need only process one of the streams. An entity e1 evaluates the quality of the streams.
In traditional pub-sub systems, e1 would likely be deployed as part of the consumer c1 (see FIG. 1B). Thus, c1 would likely subscribe to all 3 streams, evaluate the quality of each stream and select the stream with the highest quality for further processing. When or if the selected stream no longer represented the highest quality input, c1 would switch to the appropriate stream. The problem with this solution is that each consumer with the same requirement must receive all 3 streams and perform this same evaluation.
This duplicate processing and transmission can be avoided, as in FIG. 1C, by deploying e1 as a consumer or as a mediator that subscribes to all three streams and publishes the messages of the stream with the highest quality (e1 might do so by publishing to a new topic, t4, or by adding a quality property to messages from the selected stream). The problem in this case is duplicate transmission of the high quality stream (i.e., it is published by both the original producer and by e1).
This duplicate transmission can be avoided by having e1 transmit its evaluation to c1 (FIG. 1D) and c1 alters its subscription accordingly, or to p1-p3 (FIG. 1E) and the producers append the evaluation as a property to the stream (c1's subscription would specify the property representing the highest quality). The problem in both of these cases is that e1's communication with either c1 or p1-p3 is application-specific, i.e., because no services are provided by the system to address this, the application providers must develop their own signaling mechanism to reflect changes in interest.