Almost all businesses are interested in deploying data warehouses to obtain business intelligence in order to improve profitability. It is widely recognized in the technical world that most data warehouses are organized in multidimensional fashion. The text by Ralph Kimball et al., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons, ISBN: 0471153370, 1996, describes the use of multidimensional schema to model data warehouses.
A multidimensional array layout has been used by many online analytical processing (OLAP) systems for organizing relatively small data warehouses. However, this multidimensional array structure does not scale well for large data warehouses such as those that require more than 100 gigabytes of storage. Such large data warehouses are still implemented using the relational database model. While conventional relational databases provide some clustering and data partitioning, these techniques are not adequate for supporting multidimensional data.
OLAP systems tend to organize data using many or all dimensions. For efficiency reasons, the conceptual multidimensional array is actually implemented by a multilevel structure. The dimensions are separated into dense and sparse sets based on the expected number of entries for each dimension value. The dense dimensions are implemented as a multidimensional array and the sparse dimensions are used to point to each sub-array. U.S. Pat. No. 5,359,724 by Earle describes such a technique. This arrangement is still inefficient because the dense dimensions are only partially utilized. For instance, in real-world data, it has been reported that dense arrays are usually only about 20% occupied.
Spatial databases and geographic information systems use a two- or three-dimensional data model. Many data structures and methods have been proposed for organizing and indexing spatial data, e.g., R-Trees, QuadTrees, and Grid Files. Some of these indexing structures have been implemented as extensions of a relational database management system (RDBMS) but have not considered the full requirement for maintenance and query processing required in data warehouses or other such implementations. Additionally, the techniques for efficiently clustering the two- or three-dimensional data have not been considered in these systems.
A multidimensionally clustered table is one whose data is simultaneously clustered along one or more independent dimensions, or clustering keys, and physically organized into blocks of pages on disk. Once such a table is created, one can specify one or more keys as dimensions along which to cluster the table's data. Each of these dimensions can comprise one or more columns as do index keys.
Every unique combination of dimension values forms a logical “cell” that comprises blocks of pages, where a block is a set of consecutive pages on disk. The set of blocks that contain pages with data having a certain key value of one of the dimensions is called a “slice”. Every page of the table is part of exactly one block, and all blocks of the table comprise the same number of pages: the blocking factor.
When inserting new records into the table, it is highly desirable that the number of blocks searched in a cell for space to record the new record in the table be minimized. Further, if none of the blocks in the table have space available to store the new record, and an additional block should be associated with a cell, it is highly desirable that it be quickly determinable what block can be assigned.
What is therefore needed is a system and associated method to minimize the number of blocks searched in a cell space to record the new record in the table. The method determines which block can be assigned if a table has space available to store a new record and an additional block should be associated with a cell. The need for such system and method has heretofore been unsatisfied.