Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. For example, the use of credit cards or the accessing of a web page can create large amounts of data records in an automated way. These dynamically growing data sets are often referred to as data streams. The fast nature of data streams constrains their applicability to data mining tasks. For example, data streams cannot be re-examined in the course of their computation, thus requiring algorithms to be executed on the first pass of the data. Further, due to the fast nature of data streams, an effective model needs to be robust enough to be able to be rapidly updated during the course of the computation. Because of these requirements, standard data mining algorithms on static data sets cannot be easily modified to be used on data streams.
An important problem in data stream computation is query processing. Such queries include, by way of example only, problems such as selectivity estimation of range queries. This problem has been explored in the context of data streams but has been limited to methods designed for processing of historical queries. Examples of such research are disclosed in A. Dobra et al., “Processing Complex Aggregate Queries over Data Streams” ACM SIGMOD Conference, 2002; A. Dobra et al., “Sketch Based Multi-Query Processing Over Data Streams” EDBT Conference, 2004; A. Gilbert et al., “Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries” VLDB Conference 2001; A. Gilbert et al., “How to Summarize the Universe: Dynamic Maintenance of Quantiles” VLDB Conference, 2002; G. Manku et al., “Approximate Frequency Counts over Data Streams” VLDB Conference, 2002; J. Vitter et al., “Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelets.” ACM SIGMOD Conference, 1999, the disclosures of which are incorporated by reference herein.