Systems for processing streams of data utilize continuous streams of data as inputs, process these data in accordance with prescribed processes and produce ongoing results. Commonly used data processing stream structures perform traditional database operations on the input streams. Examples of these commonly used applications are described in Daniel J. Abadi et al., The Design of the Borealis Stream Processing Engine, CIDR 2005—Second Biennial Conference on Innovative Data Systems Research (2005), Sirish Chandrasekaran et al., Continuous Dataflow Processing for an Uncertain World, Conference on Innovative Data Systems Research (2003) and The STREAM Group, STREAM: The Stanford Stream Data Manager, IEEE Data Engineering Bulletin, 26(1), (2003). In general, systems utilize traditional database structures and operations, because structures and operations for customized applications are substantially more complicated than the database paradigm. The reasons for this comparison are illustrated, for example, in Michael Stonebraker, Ugur etintemel, and Stanley B. Zdonik, The 8 Requirements of Real-Time Stream Processing, SIGMOD Record, 34(4):42-47, (2005).
These systems typically operate independently and work only with the processing resources contained within a single system to analyze streams of data that are either produced by or directly accessible by the single site. Although multiple sites can be used, these sites operate independently and do not share resources or data.
In a data stream processing framework, a subset of available processing elements is used in conjunction to analyze, filter, and annotate streams of data. Continuous data streams flow between processing elements in accordance with a job plan. Each processing element performs a specific processing task on the stream of data. The job plan can be assembled manually by hand or automatically by a job manager. In larger continuous data stream processing applications, job plans are more likely to be automatically generated due to the large number of available processing elements. Many processing elements may perform similar tasks with slightly different input or output requirements or implementations. Manual determination of which processing elements, out of thousands of available processing elements to use in conjunction in order to produce the desired analysis and results is extremely difficult, if not impossible, to accomplish. An automatically generated job plan is used to plan and to update stream processing jobs without manual intervention. Therefore, if a new processing element is introduced into the data stream processing system that provides better performance or results, that new processing element can be automatically inserted into the appropriate jobs.
While automatic job planning dynamically generates the job plan using the most effective or efficient processing elements for a given inquiry, sub-optimal selection of processing elements may be necessary for purposes of consistency in results over time. Automatic job planning, however, always produces a job plan based on the most optimal use of the currently available processing elements. Therefore, the need exists for job planners to determine instances in which a user may wish to use a less than optimal set of processing elements.