Embodiments of the invention relate to encapsulation of multiplicity and sparsity in multidimensional query execution systems.
On-line analytical processing (OLAP) databases present data in a multidimensional data model. The multidimensional data model includes hierarchies. Hierarchies (also referred to as dimensions) are collections of related identifiers, or attributes (e.g., product, market, time, channel, scenario, or customer) of the data values of the system. The data values belonging to a particular hierarchy generally have some sort of structure, such as hierarchical. A multidimensional data model with three hierarchies may be represented as a cube.
Multidimensional expressions (MDX) may be described as a query language for OLAP databases. The MDX query language is also a calculation language that allows measures to be calculated (e.g., as a result of performing operations, such as adding/multiplying/subtracting/function/etc.). Multidimensional expressions are often referred to as MDX expressions. An MDX query may be referred to as a multidimensional query.
An MDX expression is evaluated for a collection of cells in a multidimensional cube. This collection of cells is called the context of the evaluation (i.e., “evaluation context”). A context tuple is the coordinates of a cell in the evaluation context. A context tuple may be represented using members, one from each hierarchy. The evaluation context may be represented by a set of context tuples.
Some systems allow for a block execution mode. In block execution mode, an MDX expression is evaluated once for the entire evaluation context, and the data is fetched from the data source in one query.
The following example MDX Query (1) for a calculated measure illustrates the concept of block execution and evaluation context:
MDX Query (1)WITH   MEMBER      [Measures].[C1] as ‘      [Measures].[Unit Sales]*1.2’SELECT   {[Measures].[C1]}   ON COLUMNS,   {[Product].[Product Family].MEMBERS}   ON ROWSFROM   [Sales]
In MDX Query (1), the WITH clause indicates a calculated measure. In particular, MDX expression “C1” is calculated as “Unit Sales*1.2”. The SELECT clause indicates that “C1” is to be listed for the columns and that members of the “Product Family” are to be listed for the rows. The FROM clause indicates that data is to be obtained from the “Sales” cube.
MDX Query (1) asks for the evaluation of calculated measure C1 for each of the three products in the [Product Family] level. In block execution mode, C1 is evaluated once for all three tuples in the set [Product].[Product Family].MEMBERS, and there is one data retrieval request of [Measures].[Unit Sales] for all three products. The evaluation context for the calculated measure C1 is the set representing the three cells.
The evaluation of an MDX expression in block execution mode may result in multiple values, with one value per cell. Some systems keep the multiple values for a calculated measure in an array and match the values to the evaluation context by their positions in the array. Because a value's position in the array does not fully specify the corresponding tuple in the evaluation context, matching based on the position requires reconstruction of the relationship between the value and the tuple as the evaluation context changes during the evaluation. Such matching is complicated and may cause erroneous results for certain MDX expressions.
MDX expressions that have partial or no reference to the evaluation context may result in multiplicity (i.e., the same value for all cells that differ only in members of the unreferenced hierarchies). That is, if a calculated measure does not reference a particular hierarchy of the evaluation context, then that particular hierarchy is said to be degenerated. Also, unreferenced hierarchies are called degenerated hierarchies for the MDX expression. The member M3 is said to have two operands: M1 and M2. For example, to calculate the MDX expression M3 in the MDX Query (2), first the MDX expression M1 and the MDX expression M2 are evaluated, and then the product of M1 and M2 is evaluated.
MDX Query (2)WITH   MEMBER      [Measures].[M1] as ‘      [Customers].CURRENTMEMBER.CHILDREN.COUNT’   MEMBER      [Measures].[M2] as ‘      [Product].CURRENTMEMBER.CHILDREN.COUNT’   MEMBER      [Measures].[M3]as ‘M1*M2’SELECT   {[Customers].[Country].MEMBERS}   ON COLUMNS,   {[Product].[Product Family].MEMBERS}   ON ROWSFROM   [Sales]WHERE   ([Measures].[M3])
In MDX Query (2), the WITH clause indicates that M1, M2, and M3 are calculated measures. In particular, “M1” is calculated as “CURRENTMEMBER.CHILDREN.COUNT”; “M2” is calculated as “CURRENTMEMBER.CHILDREN.COUNT”; and M3 is calculated as “M1*M2”. The SELECT clause indicates that members of “Country” are to be listed for the columns and that members of “Product Family” are to be listed for the rows. The FROM clause indicates that data is to be obtained from the “Sales” cube. The WHERE clause indicates that data returned by the SELECT statement should be filtered by the member M3 of the Measures dimension. Data in the intersection of the members returned by the SELECT clause and member M3 is returned.
The MDX expression for M1 evaluates to the same value for all three products because the calculation of M1 does not reference the members of the Product hierarchy on the evaluation context. This is an example of multiplicity. The resulting evaluation of M1 is illustrated in Table (1):
Table (1) for M1 (evaluated in MDX Query (2))CanadaMexicoUSADrink193Food193Non-193Consumable
Such repetitions consume extra evaluation time and additional memory, especially when the set for the product becomes very large. As mentioned above, some systems keep the multiple values for a calculated measure in an array. Some of these systems may handle multiplicity by omitting the repeated values in the array for the calculated measure, without keeping the full information of which values are omitted. Omitting repeated values requires matching of one value to multiple context tuples.
Operations among different MDX expressions (e.g., M1 and M2 of MDX Query (2)) require matching values that correspond to the same context tuple from the participating MDX expressions. Such matching compares arrays used to store values for evaluated MDX expressions, where repeated values may be omitted for different context tuples in different MDX expressions. In addition, the logic for matching values is implemented for each operator and function in the MDX language.
Data sparsity in multidimensional systems often causes MDX expressions to result in many empty cells. Some systems enumerate the context tuples and perform operations for the corresponding values. Performing operations on these empty cells may be expensive for large evaluation contexts.