The present invention relates to information processing systems, and in particular relates to techniques for optimizing information processing paths within these systems.
Messaging middleware, by supporting general messaging and message transformation between network nodes, facilitates the integration of distributed components of a processing application. Middleware collects messages from information producers (xe2x80x9cpublishersxe2x80x9d), filters and transforms the messages as necessary, and disseminates applicable messages to interested information consumers (xe2x80x9csubscribersxe2x80x9d). This type of system is known as a publish/subscribe system. Publish/subscribe middleware systems are therefore important to the inter-connection of heterogeneous and distributed components of large information networks, which rely on message communication, in domains such as finance, process automation, and transportation.
A exemplary publish/subscribe system may be represented by an Information Flow Graph (xe2x80x9cIFGxe2x80x9d) as a set of information publishing nodes which emit event messages, a set of information subscribing nodes which read event messages, a set of intermediate nodes where messages coming from streams along different input arcs are interleaved into an outgoing stream of messages, i.e., a replica of the stream is sent along each outgoing arc. The nodes may be connected by arcs of at least two kinds:
Selection arcs, which xe2x80x9cfilterxe2x80x9d messages flowing from one node to another by only passing along messages which satisfy a predicate; and
Transformation arcs, which xe2x80x9ctransformxe2x80x9d messages flowing from one node to another by adding, dropping, and/or recomputing the fields of each message.
A publish/subscribe system may include a large network or xe2x80x9cgraphxe2x80x9d of nodes and arcs, with many possible paths from each publishing node to each subscribing node, each path comprised of possibly many selection and transformation arcs, resulting in a particular sequence of operations between each publisher/subscriber node pair.
Using subject-based publish/subscribe systems as a starting point, recent advances include:
1. Content-based publish/subscribe. Rather than treating events as un-interpreted data with a single xe2x80x9csubjectxe2x80x9d field, schemas are associated with event streams, and express subscriptions as predicates over all fields in the event.
2. Stateless event transformnations. To support scenarios where events from multiple publishers are similar but not identical, events may be subject to transform operations. These operations are stateless in the sense that they do not depend upon prior events.
3. Event stream interpretation. To support subscribers who are interested not only in published events but also in events such as summaries, trends, and alarms, derived from a sequence of related events, xe2x80x9cstatefulxe2x80x9d operations are supported (operations whose results depend on the event history). State can also be used to express the xe2x80x9cmeaningxe2x80x9d of an event stream, and by implication, the equivalence of two event streams.
Designers of such systems are often faced with the challenges of reducing abstract descriptions (e.g., Information Flow Graphs) of select and transform sequences between nodes into a system design which utilizes resources efficiently, and redesigning existing systems with the same goals. One particular problem involves the consolidation of transform operations at the periphery of the network, and select operations at the interior, so that existing techniques for efficient content-based subscription can be used as the basis for an optimized implementation of middleware.
The present invention provides techniques (methods, systems and computer program products) for reducing and optimizing publish/subscribe systems, wherein any operation sequence from one node to another may be replaced by an equivalent sequence comprised of a single select followed by a single transform operation. The optimization rules disclosed herein can be automatically applied during system design or redesign, similar to the way computer programs can be automatically optimized by state-of-art compilers.
Using a set of design rules disclosed herein, any system flow diagram (e.g., Information Flow Graph) may be converted or xe2x80x9crewrittenxe2x80x9d so that select operations are combined and xe2x80x9cpushedxe2x80x9d toward the sources (publishers), and transform operations are combined and xe2x80x9cpushedxe2x80x9d toward the subscribers. (Because transforms may destroy information, they cannot, in general be xe2x80x9cpushedxe2x80x9d ahead of selects.)
This rewriting can be done by an automated system having as its input a sequence of selects and transforms representing the flow and processing of messages. The system produces an optimized model which avoids transform functions for messages that would be eliminated by (later) select operations; and which xe2x80x9cpushesxe2x80x9d all selects in the information flow toward a single node so that all event distribution (selection) can be accomplished by a single content-based publish/subscribe system.
The present invention therefore advances the technology of messaging middleware and extends its range of application by providing optimizations which allow stateless messaging systems to be converted to a form which can exploit efficient multitask technology developed for content-based publish/subscribe systems.