The present invention relates generally to data stream processing and relates more particularly to the optimization of data stream operations.
With the proliferation of Internet connections and network-connected sensor devices comes an increasing rate of digital information available from a large number of online sources. These online sources continually generate and provide data (e.g., news items, financial data, sensor readings, Internet transaction records, and the like) to a network in the form of data streams. Data stream processing units are typically implemented in a network to receive or monitor these data streams and process them to produce results in a usable format. For example, a data stream processing unit may be implemented to perform a join operation in which related data items from two or more data streams (e.g., from two or more news sources) are culled and then aggregated or evaluated, for example to produce a list of results or to corroborate each other.
However, the input rates of typical data streams present a challenge. Because data stream processing units have no control over the sometimes sporadic and unpredictable rates at which data streams are input, it is not uncommon for a data stream processing unit to become loaded beyond its capacity, especially during rate spikes. Typical data stream processing units deal with such loading problems by arbitrarily dropping data streams (e.g., declining to receive the data streams). While this does reduce loading, the arbitrary nature of the strategy tends to result in unpredictable and sub-optimal data processing results, because data streams containing useful data may unknowingly be dropped while data streams containing irrelevant data are retained and processed.
Thus, there is a need in the art for a method and apparatus for adaptive in-operator load shedding.