Relational databases are known which are used to store large quantities of normalized data. Traditional relational databases store data in the form of relations. A primary key is used to correspond to queried values. For a given primary key, there is associated data arranged in a two-dimensional form. Queries are performed by defining query values to be searched in the primary key, and traversing the relations to find the desired information.
Many applications, however, have a need to structure data in multiple dimenstions. In multi-dimensional databases, data values are located at points in an n-dimensional conceptual space defined by specific positions along one or more axes. For example, three axes specifying month, product and sales district would delineate a three-dimensional space containing sales data. Each individual dollar value for sales would be identified by the combination of one specific point along each axis: the sales value for a specific month, product and sales district. The n-dimensional space can be visualized as a data cube.
In the relational database, this example would be modeled by constructing a single primary key based on the month, product, and sales districts with an additional column for the value being stored. The relational model, therefore, requires additional space because it is storing the primary key as part of each row, and is further inefficient with respect to queries within a given subplane of the cube, such as accessing all rows where sales district is limited to a specific value.
In a typical multi-dimensional data cube, a storage location is allocated for every possible combination of every dimensional value. For example, in a three-dimensional data cube wherein each dimension has three possible values, 27 storage locations are allocated, one for each possible combination of the dimensional values. The lowest level detail data is typically provided by data entry and each upper level data cell is then computed by aggregating the detail data to fill in the data cube. To speed access time for users, the data cube can be stored in main memory.
In many multi-dimensional data cubes, especially large ones with many different dimensions, there are certain combinations of dimensional values for which there are relatively few data values. For example, if a data cube contains sales values dimensioned by product, region, and time, there may not be any sales values for certain products in certain regions during at least some time periods. In large multi-dimensional databases having several dimensions, it is fairly common for most of the data cube to be empty. Indeed, certain financial data may be much less than one percent populated.
Dimensions having many dimensional value combinations for which there are relatively few data values are referred to as “sparse.” For sparse dimensions, it is wasteful to allocate storage space for each possible combination of dimensional values because many of the combinations will contain no data values. Instead, storage space can be allocated only for those dimensional combinations having data values. The problem of efficiently allocating storage space in a multi-dimensional data cube is referred to as “sparsity management.” Sparsity management may include combining multiple dimensional attributes into a composite tuple which allows several sparse dimensions to be combined, thereby saving storage space.
Further, many dimensions define data values that are combinations of other data values. For example, in the sales data above, the time dimension might include attributes for months, quarters, and years, in which the data values for “January,” “February,” and “March” are combined for the data value for “Quarter 1.” Similarly, the data values for the four quarters are combined, or aggregated, to compute the value for “Year.” Such attributes that are combinations of other attributes define a hierarchy of associated data values, in which the data values corresponding to one attribute include the values in other attributes
Often, computing the data values that are aggregated from other associated data values can be time consuming. Data values used in computing the aggregate values may not be stored in the same area of the storage medium. Retrieving the data values from the storage medium, such as a disk, for example, may require many fetches. The aggregation operation which computes the aggregate values may have to iterate through many associated data values. Multiple and often redundant fetches may need to be performed to fetch values stored on the same disk page. These additional fetches increase the time and resources required to complete the aggregation operation. In a large multidimensional database, such increases can be substantial.