1. Field of the Invention
The present invention relates to specifying multidimensional calculations for a relational on-line analytical processing (OLAP) engine.
2. Description of the Related Art
On-line analytical processing (OLAP) has become increasingly popular. Instead of reviewing piles of static reports printed on green-bar paper, an OLAP analyst can explore business results interactively, dynamically adjusting the view of the data and asking questions and getting answers almost immediately. This freedom from static answers to fixed questions on a fixed schedule allows business analysts to operate more effectively and to effect improvements in business operations.
Nigel Pendse introduced the term “FASMI” to characterize OLAP systems. The FASMI characteristics are: Fast, Analysis, Shared, Multidimensional, and Information. For further information, see N. Pendse, “What Is OLAP?” The OLAP Report.
As for fast, in keeping with the spirit of the “O” in OLAP, such systems need to provide results very quickly usually in just a few seconds, and seldom in more than 20 or 30 seconds. This level of performance is key in allowing analysts to work effectively without distraction.
As for analysis, considering the “A” in OLAP, OLAP systems generally provide rich analytic functions appropriate to a given application, with minimal programming.
As for shared, an OLAP system is usually a shared resource. This means that there is a requirement for OLAP systems to provide appropriate security and integrity features. Ultimately, this can mean providing different access controls on each cell of a database.
As for multidimensional, multidimensionality is the primary requirement for an OLAP system. OLAP products present their data in a multidimensional framework. Dimensions are collections of related identifiers, or attributes (product, market, time, channel, scenario, or customer, for example) of the data values of the system. The identifiers (“The Lord of the Rings-DVD,” “San Jose, Calif.,” “2002,” “Retail Rental,” and “John Q. Public,” for example) belonging to the collection for a particular dimension generally have some sort of structure, such as hierarchical. Sometimes there is more than one natural structure for these identifiers.
The multidimensional characteristic means that an OLAP system can quickly switch among various orientations of dimensions, as well as among various subsets and structural arrangements of a dimension. Because of the multidimensional nature of OLAP systems, the collections of data that they implement are referred to as cubes. As for information, OLAP systems store and calculate information. Data for OLAP systems often come from one or more operational systems. Analytical models are applied to these data, and the results are either stored in the system or generated at query time. The quantity of information that a particular OLAP system can manage is one characteristic of that system.
Enterprises have been storing multidimensional data, using a star or snowflake schema, in relational databases for many years. Over time, relational database vendors have added optimizations that enhance query performance on these schemas. During the 1990s many special purpose databases were developed that could handle added calculational complexity and that generally performed better than relational engines.
Multidimensional OLAP (MOLAP) refers to the family of OLAP systems in which special-purpose file systems or indexes are used to store cube data. Express Web Publisher, Essbase™, TM1, and Pilot Suite are a few examples of products based on special-purpose storage and indexing technology. Microsoft's OLAP offering also includes a MOLAP engine. These systems are often read-only systems that are loaded with base data periodically, then derived results are calculated, stored, and indexed. Scalability of MOLAP systems is often limited by the size of the batch window within which derived results are calculated and stored. To improve scalability, such systems often have a means for deferring calculation of some derived results until query time.
For relational OLAP (ROLAP), star schemas have been used for many years as a means for representing multidimensional data in a relational database. Many commercial software development companies, such as MicroStrategy, Brio, Business Objects, Metacube, Hyperion, and Metaphor, have developed batch or interactive multidimensional reporting and exploration interfaces for relational star schemas. These systems were all designed and implemented before super aggregate operators were added to the Structured Query Language (SQL) language definition.
In particular, until a few years ago, relational databases allowed the calculation of aggregates at only a single level per query. For example, one SELECT statement with a GROUP BY clause would be used to retrieve a result set at a quarter level (i.e., for a set of quarters), while another SELECT statement with a GROUP BY clause would be used to retrieve a result set at a month level (i.e., for a set of months). This forced relational OLAP systems to run multiple queries against the database in order to calculate cells at varying levels.
To facilitate OLAP-type query creation and provide more advanced optimizations, a DB2® Relational Database Management System (RDBMS), available from International Business Machines Corporation, implemented three new super aggregate operators that were added to the SQL standard to allow a single query to generate multiple aggregates: ROLLUP, CUBE, and GROUPING SETS. These super aggregate operators are extensions to the GROUP BY clause and specify that aggregates be generated at multiple levels. For example, one SELECT statement may be used to obtain a result set of calculations of aggregates at multiple levels (e.g., both quarter and month).
Note that these super aggregate operators are more than mere shorthand for generating multiple grouping sets. Because multiple grouping sets are requested in a single statement, the DB2® RDBMS can build an execution plan that generates all the grouping sets in such a way that each input row needed for the calculation is referenced only once. This can result in performance improvements of orders of magnitude, especially when the set of input rows does not fit in the buffer pool (i.e., cache).
Prior art systems are designed to produce multidimensional reports showing results with different levels of granularity by issuing multiple queries. Multiple result sets are obtained for the multiple queries, and the result sets are merged to form a single report. Such systems depend on some sort of description (metadata) of the roles for the tables and columns in a star schema for generating the necessary SQL statements to retrieve the data to produce the multidimensional reports. The precise metadata varies from product to product.
Multidimensional on-line analytical processing (OLAP) systems (e.g., from companies such as Hyperion, Cognos, and Microsoft) are designed to return multidimensional result sets naturally, when given sets of members for each edge of a multidimensional cube. The multidimensional OLAP systems are also designed to compute some or all of the results in advance of any query.
Multidimensional analysis has been done using SQL since the introduction of relational databases, but relational OLAP systems have not been able to return multidimensional results sets naturally or compute some or all of the results in advance of a query.
Thus, there is a need in the art for an improved relational OLAP system.