1. Field of the Invention
Embodiments of the present invention relate generally to an improved data processing system and in particular to data stream processing. More specifically, the embodiments of the present invention provide a system for dynamically scheduling algorithms in a pipeline which operate on a stream of data.
2. Description of the Related Art
A data stream is a real-time, continuous, ordered sequence of items. The items in the data stream may be ordered based on arrival time or explicitly by timestamp. Continuous data streams naturally arise in domains such as network monitoring (e.g., telephone call records or web usage logs), sensor networks (e.g., measuring meteorological data), financial analysis, among others. Applications used to process the arriving data streams do not store the data streams in a repository, but rather process the data streams on-the-fly using continuous algorithms which require a limited amount of memory.
In data stream processing, there is a class of computational problems known as streaming problems. One example of a streaming problem is when a large amount of continuous data is received at the processing application. The processing application runs a number of processing algorithms on the data stream, usually in parallel. These processing algorithms comprise queries which operate on the data streams to locate a match to a query. If data in the stream is found to match a query (or a plurality of queries) in the processing algorithms, the processing application identifies the data stream as ‘relevant’ and stores the data for future (and often more in depth) analysis. If no query match is found in the data, the processing application identifies the data stream as ‘not relevant’ and discards the stream. Thus, a relevant data stream is a stream that contains a match to at least one query in the processing algorithms, and a non-relevant data stream is a stream that does not match any of the queries in the processing algorithms.
An example of an existing data stream processing application is SETI (Search for Extra-Terrestrial Intelligence). In the search for extraterrestrial life, numerous algorithms are used to find intelligent patterns in continuous data signal transmissions received from space.