1. Technical Field
The present invention relates to stream processing, and more particularly to processing new visualization queries for data within a running stream processing system.
2. Discussion of the Related Art
Stream processing is a technique to achieve high performance computing in a distributed system consisting of multiple computers. Stream-based applications include market data feed processing and electronic trading, network and infrastructure monitoring, fraud detection, and command and control in military environments. A stream processing application comprises a graph of stream processing operators, where nodes of the graph represent operators performing tasks, and directed edges of the graph represent data flowing between operators. A stream processing operator may be nothing more than a piece of code that produces a data value at its output every time it is given a data value at its input. Streaming data is usually organized as sequences of tuples flowing asynchronously from operator to operator. A tuple is a list of values with the same or different types. During its lifetime, a stream processing application is usually automated without any human interaction.
The flow graphs of most stream processing applications are acyclic. A stream processing application receives external data, such as stored raw data collected in advance or real-time data from sensors through source operators, and sends results through sink operators to storage spaces such as files and databases or other applications such as visualization tools.
Stream processing results are generally very large and generated in high speed. The results, for example, the internal states accumulated by a stream processing application, consist of a sequence of values of the same type or a sequence of tuples. To query and/or visualize a data stream, existing solutions store data streams in databases or other kinds of physical storage spaces. Client applications are then used to query the database to retrieve stored data streams of interest and visualize them.
For high performance stream processing, it is generally not possible to store all data because the states can change in a rate that is higher than what the mass storage can handle, and the total amount of data may exceed the existing storage space if a task runs for a long time. In addition, multiple client applications may want to visualize different data streams simultaneously, which further increases the load of the mass storage. Other reasons for not storing internal states may include complicated data structures and on-line processing requirements, as well as a demand for low latency.