Although tabular databases can be used to store many types of information, they are generally limited to representing data in two dimensions. Many users, however, have a need for data having several dimensions. For those purposes, multi-dimension databases are much more versatile than tabular or relational databases. In multi-dimensional databases, data values are located at points in an n-dimensional conceptual space defined by specific positions along one or more axes. For example, three axes specifying month, product and sales district would delineate a three-dimensional space containing sales data. Each individual dollar value for sales would be identified by the combination of one specific point along each axis: the sales value for a specific month, product and sales district. The n-dimensional space can be visualized as a data cube.
In a typical multidimensional data cube, a storage location is allocated for every possible combination of every dimensional value. For example, in a three-dimensional data cube wherein each dimension has three possible values, 27 storage locations are allocated, one for each possible combination of the dimensional values. The lowest level detail data is typically provided by data entry and each upper level data cell is then computed by aggregating the detail data to fill in the data cube. To speed access time for users, the data cube can be stored in main memory.
In many multi-dimensional data cubes, especially large ones with many different dimensions, there are certain combinations of dimensional values for which there are relatively few data values. For example, if a data cube contains sales values dimensioned by product, region, and time, there may not be any sales values for certain products in certain regions during at least some time periods. In large multi-dimensional databases having several dimensions, it is fairly common for most of the data cube to be empty. Indeed, certain financial data may be much less than one percent populated.
Dimensions having many dimensional value combinations for which there are relatively few data values are referred to as "sparse." For sparse dimensions, it is wasteful to allocate storage space for each possible combination of dimensional values because many of the combinations will contain no data values. Instead, storage space can be allocated only for those dimensional combinations having data values. The problem of efficiently allocating storage space in a multi-dimensional data cube is referred to as "sparsity management."
A typical technique to implement sparsity management is to combine the sparse dimensions into a single dense dimension. For example, when not all products are sold in all regions (i.e., product and region are sparse dimensions), then this technique creates a new dimension whose values are the combinations of product and region where actual data exists. Combinations of product and region having no associated sales values would be eliminated. By this mechanism, the sparse dimensions of the data are replaced by a single combination dimension. The resulting data is stored densely using (n-(s-1)) dimensions, where n is the original number of dimensions and s is the number of sparse dimensions in the combinations. Storage is thereby allocated only for those combinations of sparse dimension values that have some data associated with them, reducing the amount of storage space required.
Combination dimensions are effective for optimizing storage allocation; however, this efficiency comes with a cost of increased overhead. In particular, the combination dimension is maintained separately from the underlying dimensions. That is, software solutions are provided to make sure that the combined dimensions are updated as new combination dimension values become populated with data. In addition, dimension value limits may need to be synchronized between the underlying dimensions and the combination dimensions. Furthermore, the combinations are explicitly sorted to obtain an ordered report. Consequently, support for any combination dimension should be incorporated at all levels or database management.
There can also be an increased cost at run time for combination dimensions. This increased processing is primarily a result of the fact that when a combination dimension is declared, a different cube is created. In order to show data from the original n-dimensional cube (which is what the user most likely would want to see), the n-(s-1)-dimensional cube first needs to be transformed into the n-dimensional cube. This transformation, performed during run time, imposes an extra burden on the system, which in turn, hinders system performance.