Many computational systems are designed, implemented, and deployed using a programming environment so as to operate on streaming data. The processing of data streams, such as audio data, video data, stock data, radio frequency data, digitized transducer outputs, test and measurement data, SQL query data, gene sequence data, etc., has taken on increasing importance as development environments attempt to simulate or model systems dependent upon streaming data, such as high throughput or time-critical processing systems.
Stream-based processing may be defined as a processing of data samples arriving sequentially over a period of time. A data sample may be scalar in nature, that is, a single data element. It may also be a vector, a matrix, a higher-dimensional regular array of data elements, etc. A data sample may also be irregular or non-uniform in structure, depending upon the nature of the intended application. Continuous-time data may be sampled discretely in time to produce a sampled sequence of streaming data. The sequence of data samples over time may have periodic sampling (that is, uniformly sampled over time) or may be aperiodic with respect to the sampling interval. The duration of a data stream may be finite in time with short duration, or having a sufficiently long duration so as to be considered infinite in practice for a given application. The stream-processing system may therefore be designed to handle an infinite stream of data as a design requirement.
Streaming operations may also require the collection and processing of buffered sub-sequences of data in the data stream, the buffered data being referred to as a data buffer, a batch, or a data frame. A data frame may represent a finite time interval, and the processing of an infinite sequence of data frames may be a requirement of the stream processing system. Data may therefore be input to a stream processing system as individual samples or as frames of data samples. A data sample may include one or more data elements relating to the data at a particular time point.
Different models of computation may be employed by the programming and/or modeling environment being utilized. A modeling environment may be either textual or graphical in nature, and each model of computation may impose certain design restrictions and semantic constraints. Dataflow is an example of one family of models of computation. Specific members of the dataflow family may include dynamic dataflow, synchronous dataflow, boolean dataflow, and the like. A particular dataflow model of computation may impose some restriction on the types of computational semantics that can be modeled and implemented by the system, such as, for example, forbidding feedback, recursion, different mixtures of sample rates, different mixtures of consumption and production rates of the computational process, different mixtures of frame sizes, etc. A dataflow model may also offer certain capabilities to the model designer and/or user, such as, for example, providing higher data throughput, deterministic performance, or greater expressivity in terms of modeling semantics.
Many stream processing systems are implemented using dataflow systems that may be modeled with a dataflow language, where the execution of a particular system component may be triggered by the availability of data as inputs to one or more system components.
For example, in a graphical programming language that implements a dataflow-based stream processing system, the program or model may include model components represented as blocks with inputs and/or outputs. The graphical program or model may also include arrows between the blocks, where the arrows are used to represent the flow of the input and output data. Components in these program or model environments may be executed as soon as all of the inputs become valid depending upon how the development environment is implemented and the specific model of computation employed.
Graphical programming environments that support stream processing operations may include block libraries that contain blocks associated with code for stream processing algorithms. The blocks provide a mechanism for programmers to add components to a model to handle the processing of stream data. The addition of the stream processing blocks to the model in the graphical modeling environment enables the execution of the associated stream processing algorithm. The stream processing algorithm creates a stream component with an internal state, calculates new output values using the state information and then updates the state information.
The semantic constraints and performance limitations imposed by the choice of a dataflow model of computation may limit the applicability of current design tools to the design of practical stream processing systems in text-based computing environments.