Web analytics refers to the analysis of data associated with website traffic. For instance, web analytics can be used to mine visitor traffic data. A variety of visitor traffic data is measured such as what browser is being used, what links on a given web page were selected, whether a product was purchased, and the like. There are number of web analytics tools presently available such as Site Catalyst version 11 from Omniture of Orem, Utah. These tools are able to capture data on website usage, and responsive to a user's request display a variety of different metrics on website usage such fallout/conversion, A/B testing, and the like.
Typically, such web analytics tools generate website traffic reports that are useful to website administrators and other individuals who wish to determine how many visitors a site is attracting, as well as the characteristics and behavior of those individuals.
In order to provide accurate statistical reporting on website traffic by a large number of visitors, sampling techniques are usually applied. A processing module monitors visits to a website, for example by consulting server logs, and performs a sampling operation to discard some of the website traffic data while retaining a representative sample. This representative sample is then used in constructing reports to be presented to a user such as a website administrator.
Raw data and/or sampled data describing website traffic are typically stored in a database or other data store that is accessible to a web analytics report generation system. Often, the amount of data to be stored is relatively large.
Existing techniques for data compression can be applied in order to reduce the size of the stored data. However, such techniques are usually not optimized to storage of website traffic data, and therefore do not take advantage of particular characteristics of such data. Accordingly, existing techniques are not optimally effective in compressing website traffic data.
Furthermore, when generating reports it is often useful to be able to filter and/or sort data by reference to values, so as to present meaningful statistics as to website traffic patterns.
What is needed, therefore, is a data compression technique that takes advantage of particular characteristics of website traffic data and thereby provides improved compression results. What is further needed is a data format for storage of website traffic data that facilitates a high degree of compression for such data. What is further needed is a data format that yields greater efficiency when filtering, sorting, and or extracting selected data by reference to values.