1. Technical Field
The present invention relates to stream processing in a concurrent system, such as, multi-core computers and computer clusters.
2. Discussion of the Related Art
Massive multi-core systems hold the promise of greatly improving the performance of multi-threaded applications. Unfortunately, the complexity of multi-core systems has been difficult for the entire information technology (IT) industry. Concurrent programming is notoriously hard. For example, making a robust software system that runs on a cluster of multi-core computers in a cost-effective way is already beyond the reach of average software engineers. If not carefully designed, multi-threaded applications can suffer from high costs of data movement and inefficient central processing unit (CPU) usage. With the performance of a single core pushed to its limit, in the foreseeable future developers will take advantage of concurrent programming models to design and implement competitive software systems.
Stream processing is one of the programming models for concurrent applications. In contrast to the “pull-based” model used by conventional database management systems, stream processing applications use a “push-based” data access model. In a typical asynchronous stream processing system, a stream processing application is described with a flow diagram represented by a graph of operators. FIG. 1 shows an example of a stream processing application that analyzes weather and energy consumption data. In FIG. 1, data from two different sources is first processed independently, and then, the processing results are jointly analyzed by a set of operators. Each operator consumes zero or more input streams and produces zero or more output streams.
As each relation in a database management system is often understood as a table, a stream in a stream processing system can be considered as a sequence of tuples containing multiple attributes (also known as fields). Stream processing systems have a great advantage over conventional database systems for continuously generated data whose size is too large to store in a conventional database management system, and for event-based data whose diminishing relevance nullifies the importance of persistent storage.
Stream processing applications achieve concurrency through asynchronicity. For example, incoming tuples of an operator can be queued if the operator is busy processing an earlier tuple. For systems with multiple processor cores, operators can be placed in different cores or machines so that the processing can be pipelined or executed in parallel. For the example in FIG. 1, the processing of operators W1 and W2 can be pipelined, and both executed in parallel with operator W3 when a tuple is received from the source.