Examples of data streaming applications include applications that process data such as network traffic records, stock quotes, Web clicks, sensor data, and call records. One type of network traffic record is known as a NETFLOW record, which is a record generated in accordance with NETFLOW protocol available from Cisco Systems, Inc. (San Jose, Calif.). NETFLOW and CISCO are trademarks of Cisco Systems, Inc.
Such data streams can generate hundreds of gigabytes of information each day. Processing of such vast amounts of data can obviously place a heavy load on the data processing system that performs such processing. The situation is further exacerbated since analyzing huge volumes of data can require a large number of aggregate queries to be processed. As is known, an aggregate query is a query that performs an aggregate computation (e.g., summation, average, max, min, etc.) on a given data set (e.g., a data stream). These queries may be generated by system administrators seeking to obtain information about the system.
Thus, for real-world deployment, scalability is a key requirement for these types of collection systems. Naïve query answering systems that process the queries separately for each incoming record can not keep up with the high stream rates.
Accordingly, what is required for scalability is an improved technique for processing data stream queries.