Quantiles are useful in characterizing the data distribution of evolving data sets. For example, quantiles are useful in many applications, such as in database applications, network monitoring applications, and the like. In many such applications, quantiles need to be tracked dynamically over time. In database applications, for example, operations on records in the database, e.g., insertions, updates, and deletions, change the quantiles of the data distribution. Similarly, in network monitoring applications, for example, anomalies on data streams need to be detected as the data streams change dynamically over time. Computing quantiles on demand is quite expensive, and, similarly, computing quantiles periodically can be prohibitively costly as well. Therefore, it is desirable to incrementally track quantiles of the data distribution.
Most incremental quantile estimation algorithms are based on a summary of the empirical data distribution, using either a representative sample of the distribution or a global approximation of the distribution. In such incremental quantile estimation algorithms, quantiles are computed from summary data. Disadvantageously, however, in order to obtain quantile estimates with good accuracies (especially for tail quantiles, for which the accuracy requirement tends to be higher than for non-tail quantiles), a large amount of summary information must be maintained, which tends to be expensive in terms of memory. Furthermore, for continuous data streams having underlying distributions that change over time, a large bias in quantile estimates may result since most of the summary information is out of date.
By contrast, other incremental quantile estimation algorithms use stochastic approximation (SA) for quantile estimation, in which the data is viewed as being quantities from a random data distribution. The SA-based quantile estimation algorithms do not keep a global approximation of the distribution and, thus, use negligible memory for estimating tail quantiles. Disadvantageously, however, the existing SA-based quantile estimation algorithms derive each quantile estimate individually, in isolation, which causes problems in incremental quantile estimation. First, derivation of the quantile estimates individually often leads to a violation of the monotone property of quantiles (e.g., such as where the value of the 90% quantile is less than the value of the 80% quantile). Second, although this incremental nature is amenable to continuous data updates, use of derivative information renders the SA-based quantile estimation algorithms sensitive to data order and the particular data distribution during intermediate updates. Third, the existing SA-based quantile estimation algorithms cannot handle dynamic underlying data distributions. These and other issues associated with existing SA-based quantile estimation algorithms present challenges for applications in which incremental quantile tracking is performed.