1. Field of the Invention
The present invention relates generally to stream processing and, in particular, to workflow composition. Still more particularly, the present invention provides a method, apparatus, and program product for stream processing workflow composition using automatic planning.
2. Description of the Related Art
Stream processing computing applications are applications in which the data coming into the system in the form of information flow, satisfying some restriction on the data. Note that volume of data being processed may be too large to be stored and, therefore, the information flow must be processed on the fly. Examples of stream processing computing applications include video processing, audio processing, streaming databases, and sensor networks.
Component-based Software Systems (CBSE) are concerned with the development of software intensive systems from reusable parts (components), the development of reusable parts, and system maintenance and improvement by means of component replacement and customization, as well as development a framework for component composition. Composition may be done statically or dynamically. This disclosure is concerned with dynamic component composition.
We are concerned with specific component based systems, in particular stream processing component based systems. All the composition details and information about how to glue together system from the components, and how to configure components, are stored in the workflow. Workflow can provide in addition some extra information.
This approach enables increased code reuse, simplified development, and high flexibility of the system. Components may be interconnected in multiple configurations, achieving highly complex functionality via composition of simpler black-box operations. Such architectures are being currently developed in many application areas, in particular, stream processing applications.
In the component based stream processing architectures, the stream processing applications are composed of several processing units or components. The processing units can receive information streams on one or more input ports and produce one or more output streams, which are sent out via output ports. The output streams are a result of processing the information arriving via the input streams, by filtering, annotating, or otherwise analyzing and transforming the information. Once an output stream is created, any number of other components can read data from it. All processing units together compose a workflow. A stream processing application reads and analyzes primal streams coming into the system and produces a number of output streams that carry the results of the analysis.
Composing stream processing workflows is a labor-intensive task, which requires that the person building the workflow have an extensive knowledge of component functionality and compatibility. In many cases, this makes it necessary for end-users of stream processing applications to contact application developers each time a new output information stream is requested and, as a result, a new workflow is needed. This process is costly, error-prone, and time-consuming. Also, changes to other elements of the stream processing system may require changes to the workflow. For example, processing units or primal streams may become unavailable, users may place certain restrictions on the output, or changes may be made to the components themselves.
In large practical stream processing systems, both changes in the data coming into the system and changes in the system configuration can invalidate deployed and running stream processing applications. With time, these applications can start to produce output that no longer satisfies the user's requirements or they can be relying on primal streams that have become inactive or some additional system changes like adding new hardware or new components/processing units. In many situations, users' requirements can be better satisfied if an existing workflow is updated with newly available primal streams or components/processing units. Therefore, when changes such as those described above occur, the workflow must be reconfigured quickly, before any potentially valuable streaming data is lost. Such timely reconfiguration is extremely difficult to achieve if the workflow composition requires human involvement.