In many applications, such as online authentication and cyber security, the value of a specific risk score is hard to interpret and should be considered in the relative context of the population and/or the user/entity history. In other words, it is necessary to compare the risk score with the relevant score distribution. In this way, the risk involved in a specific event (e.g., a financial transaction, login attempt, or a server access) is determined by the percentile of the risk score.
Quantiles divide the range of a distribution into contiguous intervals with substantially equal probabilities. Computing the quantile for the tail of the distribution in the settings of large data streams is known as a hard problem, where naïve approaches are not feasible. Computing estimated quantiles allows a quantile query to be answered in a space efficient manner. In the targeted quantile problem, the accuracy of the quantile estimation varies for different percentiles. In security applications, for example, high accuracy is desired at the tip of the distribution (typically associated with higher security threats) while moderate accuracy is sufficient at the center of the distribution. This relaxation of the quantile problem allows a further reduction in the space requirements of the algorithm. Finally, the distribution of values may change over time.
A need remains for an improved algorithm for estimating quantiles and/or percentiles over streaming data, where streaming data cannot be stored and processed afterwards.