Almost all businesses are interested in deploying data warehouses to obtain business intelligence in order to improve profitability. It is widely recognized in the technical world that most data warehouses are organized in multidimensional fashion. The text by Ralph Kimball, et al., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons, ISBN: 0471153370, 1996, describes the use of multidimensional schema to model data warehouses.
A multidimensional array layout has been used by many online analytical processing (OLAP) systems for organizing relatively small data warehouses. However, this multidimensional array structure does not scale well for large data warehouses such as those that require more than 100 gigabytes of storage. Such large data warehouses are still implemented using the relational database model. While conventional relational databases provide some clustering and data partitioning, these techniques are not adequate for supporting multidimensional data.
OLAP systems tend to organize data using many or all dimensions. For efficiency reasons, the conceptual multidimensional array is actually implemented by a multilevel structure. The dimensions are separated into dense and sparse sets based on the expected number of entries for each dimension value. The dense dimensions are implemented as a multidimensional array and the sparse dimensions are used to point to each sub-array. U.S. Pat. No. 5,359,724 by Earle describes such a technique. This arrangement is still inefficient because the dense dimensions are only partially utilized. For instance, in real-world data, it has been reported that dense arrays are usually only about 20% occupied.
Spatial databases and geographic information systems use a two- or three-dimensional data model. Many data structures and methods have been proposed for organizing and indexing spatial data, e.g., R-Trees, QuadTrees, and Grid Files. Some of these indexing structures have been implemented as extensions of an relational database management system (RDBMS) but have not considered the full requirement for maintenance and query processing required in data warehouses or other such implementations. Additionally, the techniques for efficiently clustering the two- or three-dimensional data have not been considered in these systems.