1. Field of the Invention
The present invention relates generally to the data processing field and, more particularly, to a decentralized computer implemented method, system and computer usable program code for dynamically optimizing component placement in an event-driven component-oriented network data processing system that is subject to changes in function, infrastructure and/or performance.
2. Description of the Related Art
Component placement is an important factor in optimizing performance in an event-driven component-oriented network data processing system. Optimal component placement may be defined as placing components (for instance database query operators) in a flow onto an available set of machines in a network such that the placement minimizes the end-to-end latency for each path from producers to consumers of events.
Prior efforts that address the problem of component placement include the use of static centralized placement algorithms that are not responsive to changes that may occur in an event-driven component-oriented network data processing system. Such changes may include, for example:
1. Changes in Function:                a. Producers or consumers may be added or deleted.        b. Components may be added, deleted or modified.        
2. Changes in performance characteristics:                a. Message rates from producers may change.        b. Data may change causing workload on different components to change.        
3. Changes in Infrastructure:                a. Server capacities may change or servers may go on or off line.        b. Links between servers may become congested or unavailable.        
Known approaches to static distributed component placement include “biological” approaches in which component placement is described in terms of activities performed by a colony of ants. In one known biological approach, ants visit nodes in a network data processing system and assign a task to each node such that the product of the flows between activities is minimized by the distance between their locations. Since the tasks are static and the flow between activities is not governed by a stream where the data rate can vary, the scope of these algorithms does not extend to changes in the network data processing system such as described above.
Another class of task placement problems for which ant colony optimization algorithms have been proposed is referred to as the “Job-Shop Scheduling” problem. In the Job-Shop Scheduling problem, a set of machines and a set of jobs are given. Each job consists of an ordered sequence of operations. The problem is to assign the operations to time intervals in such a way that the maximum of the completion times of all operations is minimized and no two jobs are processed at the same time on the same machine. In this problem, the jobs are independent tasks that need to be completed and there is no event flow between the tasks.
A genetic algorithm has also been proposed for static file and task placement in a distributed system. The problem addressed is to find the optimal placement of files and tasks at sites with the objective of minimizing the total cost of transmitting files between sites and of ensuring that the aggregate capacity of any site is not exceeded, given the requirements of each site. This method cannot easily be extended to address optimal component placement when there are dynamic changes in the network infrastructure, performance or the types of files and tasks that need to be placed.
To date, algorithms inspired by biology for task placement have not been extended to work effectively in a stream-based environment where there is a flow of events between tasks.
A centralized approach to component placement algorithms has also been proposed. In particular, if conditions change in an event-driven component-oriented network data processing system, a centralized controller is responsible for re-computing an optimal component placement and updating the network. In some approaches, a dynamic load balanced strategy is developed in the context of continuous queries. The centralized controller is employed to collect workload information and make load balancing decisions.
Yet another approach studies static component placement in a hierarchical stream acquisition architecture. A theoretical analysis of the problem is provided where the data rate is fixed, but there is no consideration of how the algorithm will adapt to dynamic changes in a network.
Another known algorithm provides a data flow aware load selection strategy that can help restrict the scattering of data flows and lead to lower communication cost. This approach does not minimize the end-to-end latency of queries, and the load balancing scheme is based on partner selection which assigns a fixed number of load balancing candidate partners for each node, and the load is moved individually for each machine between its partners.
Yet a further approach uses runtime monitoring information to adapt a decentralized placement algorithm that maximizes business utility which is defined as a function of the required bandwidth, available bandwidth and delay on a given edge of the network. This approach proposes stream management middleware in which nodes self-organize into a utility-aware set of clusters; and, thus, most reconfigurations only take place within clusters. Also, the algorithm does not explicitly compute the impact of reconfiguration on service times and uses fixed thresholds to determine when to perform reconfigurations. Therefore, fluctuations in network conditions may compel the algorithm to continuously reconfigure.
Still another approach addresses the problem of optimal component composition in a distributed stream processing environment by using a hybrid approach that combines distributed composition probing with course grain global state management on top of an overlay mesh. In this approach, an aggregation node periodically updates the global state with the states of all virtual links between all pairs of nodes in the overlay mesh at large time intervals. In addition to assuming the availability of course grain global state information, this approach does not address the issue of how to dynamically perform component placement when the components are not already deployed on the network.
Another approach uses an initial centralized algorithm to assign tasks to machines, and controls the data input and output rates and CPU allocation for each node in order to achieve stability in the face of dynamic changes in the runtime environment.
In general, current approaches to component placement in a network data processing system are not fully satisfactory, and it would be desirable to provide a decentralized mechanism for dynamically optimizing component placement in an event-driven component-oriented network data processing system that is subject to changes in function, infrastructure and/or performance.