Data processing in a large-scale, enterprise application often presents usability, manageability, and scalability problems due to the large volume of data. For example, Web sites generate gigabytes of data every day to describe actions made by visitors to the sites. In fact, the average number of hits on a popular network of Web sites can reach 1.5 billion hits per day or more. This data has several dimensions, such as where each visitor came from, the time of day, the route taken through the site, and the like. Moreover, the amount of data continually increases as the number of Web services and the amount of business they conduct increases. Therefore, processing the large amount of data to produce meaningful usage reports and clickstream analysis for a network of sites involves overcoming several challenges.
Online analytical processing (OLAP) is well known to those skilled in the art for handling relatively complex database queries in a multidimensional database. In general, OLAP applications model data by a multidimensional database, often referred to as a data cube, and permit access to the data for functions such as summarizing, consolidating, performing calculations on, and indexing the data. To create an OLAP cube from a collection of data, some attributes associated with the data are identified as facts while others are used as dimensions. A dimension usually arranges data according to a hierarchy to provide different levels of granularity for viewing the data.
Unfortunately, the amount of data and size of physical entities (e.g., html pages, Web site directories) for network Web usage reporting has accumulated faster than conventional OLAP products and user interface tools can handle, which prevents them from performing satisfactorily on the server and client sides. For example, in a large-scale, enterprise implementation of an OLAP application, large dimensions (e.g., those having more than 500,000 members) present problems in terms of development and operation for a production system. Two significant factors that influence the design of a large-scale OLAP application are the scalability for the application on the server side and the usability for users using a client tool.
Large dimensions generally cause the performance and usability problems described above. First, most commercial OLAP implementations require dimensions to be loaded to memory first to improve query-time performance. A large dimension does not scale well because of the limitation on the available memory addressable space in a hardware platform. Second, the client machine memory and CPU cycles as well as the inherent problems of presenting a large number of selections to users limits usability. In this regard, users are unable to navigate through thousands of dimension members in any presently available clients to find the members of interest to the users.
Presently available OLAP implementations, however, only permit fact-based partitioning of data and do not support dimension-based partitioning strategies to mitigate problems caused by large dimensions. Therefore, improvements in data processing are desired to reduce processing time for large databases and to provide “overview” reporting (e.g., at the domain level) yet enable site specific groups to review business performance data on a detail level (e.g., at the page level). Further improvements in manageability and usability are also desired.