1. Field of the Invention
The present invention relates generally to an improved data processing system, and in particular to a computer implemented method, data processing system, and computer program product for finding an optimal policy for controlling message flow in distributed stream processing.
2. Description of the Related Art
Information dissemination systems provide a wide class of information services over a distributed network. An example of an information dissemination system is a stateful publish-subscribe system. In such a system, a number of service providers continuously deliver information of interest to a variety of service subscribers. Examples of such information services include real-time stock quotes, intelligent routing based on road traffic, news delivery, surveillance, etc. Each service provider in such a system typically maintains a large set of data items and propagates updates on those data items to service subscribers in a timely manner.
A problem encountered with existing information dissemination systems is in determining how frequently to propagate updates between servers in a distributed system, or between a server and a service subscriber. Although propagating updates immediately after the update occurs increases the timeliness of the data on the subscriber-side, propagating an update requires using system resources, such as bandwidth within the communication network between service providers and subscribers. Extreme strategies of propagating updates include (1) propagating every update—which guarantees the timeliest delivery but may produce excessive message traffic, and (2) withholding updates indefinitely—which minimizes message traffic, but prevents subscribers from obtaining timely information. Thus, updates can be sent either more frequently, thereby providing increased timeliness of data but at a cost of greater system utilization, or less frequently, with less timeliness but better system utilization. Furthermore, the cost of updates can have multiple components. For example, while there is a direct system cost per message sent, there may also be an indirect cost since more messages means more congestion, and eventually more delay.