Quantiles are useful in characterizing the data distribution of evolving data sets. For example, quantiles are useful in many applications, such as in database applications, network monitoring applications, and the like. In many such applications, quantiles need to be tracked dynamically over time. In database applications, for example, operations on records in the database, e.g., insertions, deletions, and updates, change the quantiles of the data distribution. Similarly, in network monitoring applications, for example, anomalies on data streams need to be detected as the data streams change dynamically over time. Computing quantiles on demand is quite expensive, and, similarly, computing quantiles periodically can be prohibitively costly as well. Thus, it is desirable to compute quantiles incrementally in order to track quantiles of the data distribution.
Most incremental quantile estimation algorithms are based on a summary of the empirical data distribution, using either a representative sample of the distribution or a global approximation of the distribution. In such incremental quantile estimation algorithms, quantiles are computed from summary data. Disadvantageously, however, in order to obtain quantile estimates with good accuracies (especially for tail quantiles, for which the accuracy requirement tends to be higher than for non-tail quantiles), a large amount of summary information must be maintained, which tends to be expensive in terms of memory. Furthermore, for continuous data streams having underlying distributions that change over time, a large bias in quantile estimates may result since most of the summary information is out of date.
By contrast, other incremental quantile estimation algorithms use a stochastic approximation (SA) for quantile estimation, in which the data is viewed as being quantities from a random data distribution. The SA-based quantile estimation algorithms do not keep a global approximation of the distribution and, thus, use negligible memory for estimating tail quantiles. Disadvantageously, however, the existing SA-based quantile estimation algorithms are only valid for a single record type (namely, insertion records), and are unable to handle multiple record types, such as when insertion records are accompanied by one or more of deletion records, correction records, and update records.