A publish-subscribe middleware messaging system is a type of a distributed stream processing system. Such a system may include, for example, publishing clients, message service providers, subscribing clients and a plurality of broker machines, or brokers. The plurality of broker machines constitute an overlay network responsible for managing messaging activities occurring between and among the publishing clients, message service providers and subscribing clients. Publishing clients generate input messages (also called events) which contain one or more topics and data content, and submit them to the overlay network of broker machines. The broker machines perform transforms on information contained in the input messages generated by the publishing clients according to pre-determined specifications, thereby transforming the input messages to output messages. The information transformations are implemented as one or more software modules that are distributed among the broker machines comprising the overlay network. The output messages containing the transformed information are then delivered to the subscribing clients. The pre-determined specifications are typically created by message service providers who know the form and content of information that is of interest to clients subscribing to their messaging services.
Publish-subscribe middleware messaging systems frequently operate in an anonymous manner, meaning that publishing clients may not know how many subscribing clients there are or where they are, and, similarly, subscribing clients may not know the identity or location of publishing clients.
Publish-subscribe middleware messaging systems also may operate on input message streams in either a so-called “stateless” or “stateful” manner. A “stateless” (also called topic-based or content-based) publish-subscribe system is one in which (1) delivered messages are a possibly filtered subset of published input messages, and (2) a subscription criterion selected by a message service provider is a property that can be tested on each message independent of any other, such as “topic=stock-ticker” or “volume>10000 & issue=IBM”.
A “stateful” publish-subscribe system is one where subscriptions are “stateful”; that is, the publish-subscribe system creates output messages containing information derived from multiple messages culled from one or more message streams, e.g. “Give me the highest quote of IBM within each one-minute period.” This, furthermore, generally entails delivering information other than simply a copy of published messages, for example, “Tell me how many stocks fell during each one-minute period.”
In both the stateless and stateful cases, publish-subscribe middleware messaging systems are implemented as overlay networks, that is, a collection of broker machines that accept messages from publishing clients, deliver subscribed information to subscribing clients, and route information between publishing clients and subscribing clients.
Once a publish-subscribe middleware messaging system starts computing transforms, the placement of the software modules performing these computations becomes central to the performance of the messaging system. At a high level, this problem is similar to many earlier task assignment problems in parallel and distributed systems. However, the transform tasks that do stream processing of database operators have unique properties. These tasks are always available and therefore always running, and their resource utilization is a function of incoming message rates generated by publishing clients. The data flows from specific sources (publishing clients) to specific sinks (subscribing clients), fixing some tasks to specific processors. Furthermore, a common objective typically applied in such situations—load balancing the system for better resource utilization—is not as important. Typically client subscription update latency and throughput are more important system performance metrics and their improvement or optimizations are often the key objectives.
Accordingly, in publish-subscribe middleware messaging systems implemented through overlay networks that employ a plurality of broker machines, there exists a desire for a method and an apparatus for appropriately distributing information transforms among broker machines comprising the overlay network.
In addition, there exists a desire to have a selection of one or more performance metrics that would be improved or optimized through appropriate placement of information transforms among the broker machines comprising the overlay network. Particularly desired are methods and apparatus for placing information transforms among broker machines comprising the overlay network so that the latency and throughput of messaging activities performed by the broker machines comprising the overlay network are improved.