Various platforms for load testing of websites and web-based applications are commercially available. For example, U.S. Pat. No. 7,844,026 describes a real-time analytics tool that allows businesses to gather and display live performance data obtained from running a complex test composition on a target website or web application. This tool performs a load test utilizing large numbers (e.g., hundreds of thousands) of virtual users, providing business intelligence data results that permit a business to pinpoint performance bottlenecks and potential areas of stress in a web application.
Organizations are also interested in real user measurement (RUM) data analysis that captures and collects data about present, real user experiences when users visit and use a website or web application. For example, businesses engaged in e-commerce are often interested in a performance metric known as the “bounce rate”, which is a ratio of the number of visitors to a website who immediately leave the site after viewing only one page, versus users who click on an actionable item or icon (e.g., to place an item in a shopping cart). Since there is a strong correlation between the speed of a website (e.g., the time to load a webpage) and the probability of a user bouncing, real-time analytics that gives businesses and developers insight into RUM across all browsers and locations is very valuable.
Online analytical processing (OLAP) of collected data has recently given rise to the use of analytic dashboards as a way to visualize key performance indicators of a website or web application. Dashboards usually consist of a series of graphics, charts, gauges and other visual indicators that can be monitored and interpreted. Analytical dashboards typically support interactions with the data, such as drilling down into the underlying details. One visual indicator typically found in dashboards is a histogram. A histogram is a type of graph widely used in statistics to visually interpret numerical data by indicating the number of data points that lie within a range of values, commonly referred to as a class or bin. The frequency of the data that falls in each class is depicted by the use of a bar. The height of the bar corresponds to the relative frequency of the amount of data in the class. In other words, the higher the bar, the greater the frequency of the data. Conversely, the lower the bar the lower the frequency of the data. The bars in a histogram are arranged and displayed in the order that the classes occur.
One of the problems with providing visual indicators such as histograms in real-time analytic dashboards is that statistical information, such as a percentile calculation, needs to be performed in real-time, concurrently with the on-going collection of data, which can involve tens or hundreds of millions of real user measurements. For example, a typical way to compute a percentile is to first sort all of the data points in ascending order, i.e., smallest data points to the largest. The nth percentile is then determined by the corresponding location in the order. By way of example, if 100 data points are sorted in ascending order, then the tenth percentile is the tenth data point in the order. But with extremely large data sets the computing power and memory requirements needed to store and sort all of the data can quickly exceed reasonable bounds.