Database management systems are designed to process and manage massive “streams” of data which arrive at a very high rate. Due to resource constraints (e.g., limited memory, processing time), it becomes difficult to “read” each new update to the stream, much less store and process the update. This problem is conventionally seen in analyzing IP network traffic data. For example, new packets arrive at routers at very high rates from hundreds of thousands to many millions every second. Network operators desire that stream summarizing, such as data distribution, trending and mining for anomalous behavior, occur in real time. However, space available to process the data stream is typically considerably smaller than the stream. In some cases, when processing, the space used grows linearly with an input size (e.g., a portion of the stream), rapidly filling the space available. Thus, an analysis algorithm may not fully execute because the space is totally occupied. Summarizing, though, is essential for substantially every aspect of network management, including billing, verifying service level agreements and network security.
Typical summarizing methods use one-dimensional descriptors such as quantiles (e.g., a median=½ quantile) to describe a customer's network. For example, rather than provide an average delay or throughput on the IP network for the customer, the network operator provides the quantiles of delay and throughput of data flows associated with the customer to describe more robustly a quality of service provided to the customer. However, data streams typically represent multidimensional data which cannot effectively be described using the one-dimensional quantiles.