On-line information sources are increasingly taking the form of data streams, that is, time ordered series of events or readings. Example data streams can include, for example, live stock and option trading feeds in financial services, physical link statistics in networking and telecommunications, sensor readings in environmental monitoring and emergency response, and satellite and live experimental data in scientific computing. The proliferation of these sources has created a paradigm shift in how data is processed, moving away from the traditional “store and then process” model of database management systems toward the “on-the-fly processing” model of emerging data stream processing systems (DSPSs).
Flexible application composition is a major challenge in the development of large-scale distributed data stream processing applications. In a distributed environment, finding an application partitioning scheme that leads to superior performance is challenging. The basic building blocks of a stream processing application should advantageously be of small granularity representing simple operations. In general, this would enable flexible decomposition of the processing and better mapping of the application to the characteristics of the underlying hardware. However, in existing approaches, such fine granular stream operators may incur a large performance overhead in a distributed system due to inter-process communication.