The present invention relates generally to data stream processing by a plurality of tasks and relates more particularly to load distribution by migrating tasks to target nodes meeting predetermined criteria in terms of load distribution quality and/or migrating tasks to target nodes to reduce power and/or cooling costs.
With the proliferation of Internet connections and network-connected sensor devices comes an increasing rate of digital information available from a large number of online sources. These online sources continually generate and provide data (e.g., news items, financial data, sensor readings, Internet transaction records, and the like) to a network in the form of data streams. Data stream processing units are typically implemented in a network to receive or monitor these data streams and process them to produce results in a usable format. For example, a data stream processing unit may be implemented to perform a join operation in which related data items from two or more data streams (e.g., from two or more news sources) are culled and then aggregated or evaluated, for example to produce a list of results or to corroborate each other.
However, the input rates of typical data streams present a challenge. Because data stream processing units have no control over the sometimes sporadic and unpredictable rates at which data streams are input, it is not uncommon for a data stream processing unit to become loaded beyond its capacity, especially during rate spikes. Typical data stream processing units deal with such loading problems by arbitrarily dropping data streams (e.g., declining to receive the data streams). While this does reduce loading, the arbitrary nature of the strategy tends to result in unpredictable and sub-optimal data processing results, because data streams containing useful data may unknowingly be dropped while data streams containing irrelevant data are retained and processed. Given that clusters of machines can distribute the workload, a different strategy proposed by the present inventors is to try to use multiple nodes to handle the workload. And if such a strategy is in use during a period when the data stream volume drops, a strategy of moving tasks back to fewer nodes and quiescing some nodes altogether can lower power costs and cooling costs.
The majority of known solutions for load distribution in event-driven systems assume that event processing components are stateless. Very few known solutions target stateful operators because migrating stateful operators for load distribution purposes is challenging and expensive. In order to migrate a stateful operator, all data stream processing has to be stopped, all necessary state has to be migrated and all the events routing paths should be updated