1. Field of the Invention
This invention relates to on-line analytical processing (OLAP) systems and, in particular, to an all-main memory OLAP system. The invention relates to systems, methods, data structures and/or computer programs for implementing an OLAP system that enable increased query execution speed, require reduced data storage capacity and/or facilitate scaling of an OLAP system to large data cubes.
2. Related Art
There are two major categories of database systems: on-line transaction processing (OLTP) systems and on-line analytical processing (OLAP) systems. Typically, data in OLTP systems is updated often and accessed with simple queries, while data in OLAP systems is updated infrequently and accessed with complex queries. An online travel store, for example, could update its OLTP system for each ticket purchase, while its OLAP system, which is used by the store's management to analyze ticket purchases over time based on customer profiles, is updated once a week. OLAP systems are also known as multidimensional database systems.
Minimizing query execution time is important for both OLTP and OLAP systems. To speed up query execution, conventional OLAP systems pre-calculate aggregations (or materialized views) of the data they manage. Queries that can be answered using stored aggregations alone execute very quickly. In practice, however, it is typically infeasible to pre-calculate all aggregations necessary to make each query fast. As a result, OLAP systems are typically characterized by long query execution times.
One approach that has been used to speed up query execution in OLTP systems is to store all data of a database in main memory, instead of only storing all of the data on a non-volatile data storage medium such as a magnetic disk (or other data storage medium from which data can be accessed less quickly than from main memory), and optimize query execution procedures for all-main memory operation. For an OLAP system, however, an all-main memory approach is made difficult by the typically very large amount of data with which the system interacts.
FIGS. 1 and 2 illustrate aspects of a conventional OLAP system, many of which can also be used with the present invention, as will be appreciated from the description below. FIG. 1 is a block diagram illustrating a conventional OLAP system and the operation thereof. FIG. 2 illustrates an example of multidimensional data that could be stored in an OLAP system, the multidimensional data described by a cube (also referred to as a fact table), an aggregation and a dimension hierarchy of the OLAP system.
Referring to FIG. 1, an OLAP system 100 includes a query executer 101, an updater 102 and a database 103 including multidimensional data. To use the OLAP system 100, a client application 110 (which can be any computer program(s), such as the online travel store discussed above, the use of which entails ascertaining information regarding the data stored in a database) submits a query to the query executer 101. After parsing the query, the query executer 101 calculates the query result, using the multidimensional data stored in the database 103 (which can include a cube 103a and aggregations 103b, as discussed further below with respect to FIG. 2). The updater 102 updates the multidimensional data in the database 103 on an ongoing basis (which is relatively infrequently, typically daily or weekly) based on the current state of the data warehouse 120, which is continually revised as new data is produced and/or existing data changes. The update process can be sped up by using incremental updating, which considers only the data that has been added to the data warehouse 120 since the last update.
FIG. 2 illustrates an example of multidimensional data that might be generated by a chain store selling personal computers (PCs) and printers. The multidimensional data is stored in a cube 201, a part of which is shown in FIG. 2. Each row of the cube 201 represents a record. Each record (records 201a, 201b, 201c and 201d are shown in FIG. 2) represents the sales of a particular item (PC or printer) in a particular store (data for two stores, store A and store B, is shown in the part of the cube illustrated in FIG. 2) for a particular month (data for two months, January and February, is shown in the part of the cube illustrated in FIG. 2). Each column of the cube 201 represents a data variable of interest. The columns of the cube in a conventional OLAP system can be of two types: dimensions and measures. A dimension is a column of a cube that can be eliminated (“rolled up”) when an aggregation is calculated. (At least one dimension is rolled up for each aggregation.) A measure is a column of a cube whose values are aggregated. Aggregations include operations such as, for example, “sum,” “min” and “count.” In FIG. 2, an aggregation 202 is illustrated (for the part of the cube illustrated in FIG. 2) in which sales of each item at each store are summed over all times (the dimension Time is rolled up and the measure Sales is summed).
The values of a dimension can be arranged in a hierarchy having several levels. Each dimension typically has at least two levels: an “All” level that includes all the dimension values and one or more levels that each include subsets of the dimension values. As illustrated in FIG. 2, the values of the Time dimension are arranged in a hierarchy 203 including three levels: All, Year and Month. The number of possible aggregations of a cube is the product of the numbers of levels for each dimension, e.g., if a cube includes first, second and third dimensions that are arranged in hierarchies of 3, 4 and 4 levels, respectively, the number of possible aggregations of the cube is 48 (3×4×4). The Time dimension, for example, contributes a factor of three to the number of possible aggregations of the cube illustrated in FIG. 2, corresponding to whether the Time dimension is rolled up to the All, Year or Month level, i.e., aggregated over all time, aggregated by year or aggregated by month, respectively. Typically, the number of possible aggregations of a cube is huge, making pre-computing all aggregations impractical. As indicated above, this inhibits the effectiveness of previous approaches to speeding up query execution in conventional OLAP systems by pre-calculating aggregations.