Web analytics refers to the analysis of data associated with website visitation. For instance, web analytics can be used to mine visitor traffic data. A variety of visitor traffic data is measured such as what browser is being used, what links on a given web page were selected, whether a product was purchased, and the like. There are number of web analytics tools presently available such as Site Catalyst version 11 from Omniture of Orem, Utah. These tools are able to capture data on website usage, and responsive to a user's request display a variety of different metrics on website usage such fallout/conversion, A/B testing, and the like.
Typically, such web analytics tools generate website visitation reports that are useful to website administrators and other individuals who wish to determine how many visitors a site is attracting, as well as the characteristics and behavior of those individuals.
In order to provide accurate statistical reporting on website visitation by a large number of visitors, sampling techniques are usually applied. A processing module monitors visits to a website, for example by consulting server logs, and performs a sampling operation to discard some of the visitation data while retaining a representative sample. This representative sample is then used in constructing reports to be presented to a user such as a website administrator.
For certain types of websites, dramatic spikes in website visitation can take place frequently. For example, a website providing information on the NFL Superbowl may experience a sizable increase in traffic during the time period surrounding the annual event. In some cases, such as those in which a normally obscure website is thrust into prominence by being linked to or referred to by a large number of media sources having high visibility, it is possible to have the traffic on a single “peak” day far outweigh that of a typical day by orders of magnitude.
Conventional sampling mechanisms fail in such situations. A sampling rate that is appropriate for a website's normal level of traffic may be wholly inadequate for high-traffic days and may lead to system overloads and other performance problems. Using a different overall sampling rate to account for high-traffic days can result in drastic reduction of the overall size of the data set for normal-traffic days; the reduced data for normal traffic days then provides insufficient resolution for reporting on website visitation on those days.
What is needed, therefore, is an improved sampling mechanism that provides variable sample rates for website visitation analysis. What is further needed is a sampling mechanism that provides sufficient resolution for normal traffic days, but dynamically adjusts the sampling rate to account for high-traffic days.