1. Field of the Invention
The present invention relates generally to the data processing field, and more particularly, to a computer implemented method, system and computer usable program code for routing and delivering messages based on unstructured information payloads.
2. Description of the Related Art
A publish-subscribe messaging system has two types of clients: publisher clients and subscriber clients. Publisher clients generate messages, also referred to as events. Subscriber clients request a criterion, also called a subscription, specifying the kind of information to be delivered in the future based on published messages. Publishers and subscribers are anonymous to each other, meaning that publishers do not necessarily know how many subscribers there are or where they are located; and subscribers do not necessarily know where publishers are located.
A message typically has three parts: a header, properties, and a body. A message header includes a number of predefined fields that contain values that can be used to identify and route the message. Properties for a message can be created and values set, if there is a need to add values in addition to those provided by the header fields of the message. Message properties can be used, for example, to select messages by specifying a criterion based on the messages. A message body can be used to send and receive data in many different forms. Both message properties and the message body are optional and are often left empty.
A topic-based publish-subscribe messaging system is a messaging system in which subscriptions specify topics, which are header fields of messages that subscriber clients wish to receive. A content-based publish-subscribe messaging system is a messaging system in which the messages delivered to a subscriber are based on the content of published messages that are specified as values of some message properties. The subscription criterion is a message property that can be tested on each message independent of any other message. For example, a filter may determine whether “topic=stock-ticker/GE” or “Stock/IBM/trade:volume>1000”.
Content-based or topic-based publish-subscribe messaging systems are stateless systems, that is, systems in which the handling of one message does not affect the handling of any other message. These publish-subscribe (pubsub) systems are often used for applications providing dynamic information, such as real time stock quotes for Web pages. For example, a Web page using a publish-subscribe messaging system could reflect IBM stock prices as they change. Rather than the page being refreshed every time the IBM stock price changes, a pubsub filter may be specified such that changes are pushed to the Web page only when the price of the stock exceeds $100.
Content-based publish-subscribe messaging systems support only a limited filtering capability. To address this deficiency, mediations to process or transform messages may be introduced into the flow of traditional messaging middleware. However, mediations are complex to program and require external database services in order to store and access state. Further, groups of mediators are not easily combined.
Generally, mediations examine individual messages and perform their task in relation to those individual messages. However, there are some mediations or message transformations which examine multiple messages or even multiple message streams in order to perform their task. An example is a mediation that provides an “average” computation or a “join and filter” computation. SMILE technology (see “Relational Subscription Middleware for Internet-Scale Publish-Subscribe”, Yuhui Jin and Rob Strom, 2nd International Workshop on Distributed Event-Based Systems (DEBS '03), 2003) can aggregate information from multiple streams and deliver a message based on the aggregation. SMILE technology is, for example, capable of taking streams representing sales of seats on multiple airline flights and delivering a current number of available seats on the k cheapest flights to London to a subscriber.
Such mediations, however, only operate on simple text or numeric message attributes to provide a derived state to the subscriber. Further, message consumers typically receive only messages whose headers and properties match the selection criteria in the subscription that either specifies a single message or a history of messages. Currently, subscriptions do not select messages on the basis of the content of the message body.
With the advent of highly capable, wirelessly connected, widely distributed sensor networks, scenarios are emerging which require intelligent delivery of collected data in a timely fashion. These distributed sensor networks include sensors that capture audio and video and that can provide a wealth of data which may overlap in scope (for example, fields of view of the sensors) and coverage (for example, spatial and temporal resolution of sensors). These data provide new types of messages where the message body contains meaningful content and which can be varying in the quality of their content. While the evolution of the Web has increased information available via user pull, these new scenarios describe increased information available via push and via rich media streams. These new message types, in addition to having numeric or text data as metadata or message properties, contain unstructured information as its payload or message body.
Regardless of the content of messages, subscribers wish to receive only those messages that contain relevant data. Unlike structured payloads, a subscription to messages containing unstructured information is less accurately described if it were to rely only on specifying constraints to structured information available in the messages.
Consider the problem of a battlefield commander. The commander must keep aware of events transpiring on the battlefield. Low resolution satellite image feeds, higher resolution tank image feeds, and other multimedia information are being captured, but the commander bears the burden of sorting through all the images after they are received to obtain the most informative images. What is needed is a mechanism that will enable the commander to set up desired criteria for these multimedia messages in advance in such a way that he or she can choose to preferentially receive the most desired images. Current publish-subscribe messaging systems do not provide such a capability.
Continuing the battlefield scenario, there may be other subscribers in addition to the battlefield commander with different criteria for receiving images contained in message bodies. For example, a tank commander may want to receive images of a long view ahead of his/her tank in order to avoid ambush. This subscription must be satisfied from the same sensor data as that available to the battlefield commander; however, for this user, the criteria will be different (for example, the field of view in front of the tank as opposed to an overall view of the entire battlefield).
These various users of available sensor data would be served by specifying constraints on unstructured information in order to describe the subscription they desire. What is further needed, accordingly, is a mechanism that will provide message routing and subscription matching to users based on specified constraints of unstructured payloads. Specifically, what is needed is a mechanism for similarity matching of message payloads to subscriptions.
Consider the further example of a sensor on an oil pipeline. The sensor provides image data which is associated with metadata, such as time of image capture, location of sensor, etc. However, the payload of messages from this sensor contains far more information about the visual aspects of the field of view of the sensor. Currently, subscribers to such messages must examine all the messages or examine all the messages where the metadata fulfills a subscription specification (e.g. images taken between 11 PM and 12 PM). This becomes a problem since subscribers may receive too many messages (e.g. all messages) or too few messages (e.g. only messages that match a restrictive specification).
Furthermore, messages which fulfill a subscription specification of a subscriber may not result in data of interest to the subscriber. For example, images captured between 11 pm and 12 pm may all be identical and have no discriminating information. What is needed, accordingly, is a mechanism for specifying a subscription to images from the pipeline sensor that fulfills image criteria, such as brightness intensity or explosion in the images.
It should be noted that such unstructured payloads may be very large, and a messaging infrastructure should avoid unnecessary transmission of unneeded messages. What is also needed, accordingly, is a mechanism for restricting not only reception but also transmission of only those messages which are needed. This will allow improved scalability.