Processing large volume data often presents usability, manageability, and scalability problems. As an example, Web sites generate gigabytes of data every day to describe actions made by visitors to the sites. In fact, the average number of hits on a popular network of Web sites can reach 1.5 billion hits per day or more. This data has several dimensions, such as where each visitor came from, the time of day, the route taken through the site, and the like. Moreover, the amount of data continually increases as the number of Web services and the amount of business they conduct increases. Processing the large amount of data to produce meaningful usage reports and clickstream analysis for a network of sites involves overcoming several challenges.
In general, data warehouses are databases designed to support decision-making in an organization. The data is usually historical and static and may also contain numerous summaries. Typical data warehouses are batch updated on a periodic basis and contain vast amounts of data in relational tables summarized at different levels to improve query performance. Data warehouses are structured to support a variety of analyses, including elaborate queries on large amounts of data that can require extensive searching. In a typical data warehouse, fact tables contain the actual numeric metrics, such as a count of page views, that a user might be interested in viewing.
Fact tables usually have a plurality of foreign keys that relate to primary keys in dimension tables. This allows individual records of a fact table to be indexed or matched up to specific dimensional values. That is, given a set of dimensional values, corresponding metrics can be located. In the example above, a user wishes to view data from the page views fact table. For example, a domain dimension table allows the user to choose a single domain, and then see only the data from the page views fact table that corresponds to that target domain. Similarly, the time dimension allows the user to choose a single day and view only the data from the page views fact table that corresponds to the chosen target domain and the chosen date. Choosing the dimensions across which a user wants data to be summarized may be referred to as slicing the data. A definition of the relationship between tables in a data warehouse is usually called a schema.
Known reporting techniques involve creating heavily customized structured query language (SQL) statements for generating reports to answer a user's queries. Unfortunately, new customized procedures of this type are needed each time new report requirements arise. This leads to higher cost and greater development effort and results in less flexibility in how reports are produced.
Accordingly, a reporting engine is desired to address one or more of these disadvantages.