This invention relates to data storage and retrieval systems and more particular to multidimensional databases implemented by relational database management systems.
A multi-dimensional database comprises a group of data cells arranged by the dimensions of the data. For example, a spreadsheet exemplifies a two-dimensional array with the data cells arranged in rows and columns, each being a dimension. A three-dimensional array can be visualized as a cube with each dimension forming a side of the cube, including any slice parallel with that side. Higher dimensional arrays have no physical metaphor, but typically organize the data in the way users think of their enterprise. Typical enterprise dimensions are time, measures, products, geographical regions, sales channels, etc . . .
In a multi-dimensional database there is a need to be able to aggregate a large amount of data quickly and efficiently. Multi-dimensional databases generally have hierarchies or formula-based relationships of data within each dimension. Aggregation (also called xe2x80x9cconsolidationxe2x80x9d and xe2x80x9croll upxe2x80x9d) involves computing all of these data relationships for one or more dimensions, for example, adding up all Departments to get Total Division data. While such relationships are normally summations, any type of computational relationship or formula might be defined. Typically, database users have a given time window that they need to be able to fit the aggregation process into and the speed of the database determines the amount of data that they can process in that time. The performance of this one operation therefore has a large impact on the usefulness of the database. Improving the aggregation speed not only assists current users in handling more data faster, but can also makes the database attractive to new users when its capabilities improve enough to fit their requirements.
In typical cases a large portion of the data in a multidimensional database is extremely sparse; that is, a high percentage of the possible combinations (intersections) of the members from the data set""s dimensions contain missing data. The total possible number of intersections can be computed by multiplying together the number of members in each dimension. Data sets containing one percent, 0.01 percent or even smaller percentages of the possible data are quite common. Sparse data may be modeled in one of two kinds of structures: composite and conjoint dimensions. Each of these creates a mapping from a sparse multi-dimensional framework onto a single dense dimension, such that each instantiated multi-dimensional tuple corresponds to exactly one dimension value of the conjoint or composite. The difference between the two types is that the composite handles most of the complexity transparently while the conjoint makes it explicit. The transparent nature of composites leads to decreased complexity and easier administration as well as the potential to optimize parts of its layout and construction internally.
Traditionally the process for aggregating sparse data is done in one of two ways differentiated by whether the aggregation is occurring over a conjoint or a composite dimension.
Aggregation over a composite dimension occurs as several aggregations, one for each base dimension of the composite. Each base is aggregated as if it was a dense dimension and a temporary new composite of the remaining dimensions is used for looping. Further if any base dimensions have multiple hierarchies over them, this aggregation over the base will occur once for each such hierarchy.
Aggregation over a conjoint dimension occurs by first transforming each base dimension hierarchy into a hierarchy on the conjoint dimension. Once these hierarchies are created the conjoint is aggregated on each of these hierarchies resulting in the same number of aggregation passes as the composite solution. This method takes more administrative setup than the composite solution, but does not require creating the temporary new composite of the remaining base dimensions.
In a preferred method employed in the Oracle Express Server available from Oracle Corporation of Redwood Shores, Calif., a xe2x80x9ccomposite join hierarchyxe2x80x9d is derived from all of the base hierarchies of the composite. This join hierarchy encapsulates all the information of all base hierarchies into a single hierarchy over the composite itself. Once this hierarchy is created the composite can be treated just like a large base dimension with a single hierarchy, allowing it to be aggregated in a single pass. This composite join hierarchy is transparently created, has the ease of administration of the composite solution, and outperforms aggregation over a conjoint dimension or over a composite dimension in nearly all cases.
The present invention takes the form of a method for aggregating sparse data in a multidimensional array using a composite-join hierarchy in a way that segments the data so that any piece of the hierarchy is smaller and more likely to fit in memory.
The algorithm can be thought of as a recursive sub-cubing, wherein an n-dimensional cube is broken into a number of (nxe2x88x921)-dimensional cubes and each of those cubes are solved as (nxe2x88x922)-dimensional cubes etc. Within each division, the processing is segmented by hierarchy level so a dimension with three hierarchy levels (for example, month-quarter-year) would form three separate subcubes with one less dimension. This algorithm produces one xe2x80x98worklistxe2x80x99 for every combination of hierarchy levels in the cube.
Each of these worklists may be represented as a bitmap of the cells contained within it and may be used as a basis of generating more aggregate worklists. To minimize the need for input-output data transfers, all the derived worklists of a single worklist are generated at the same time. This is accomplished without keeping more than n-worklists active at any given time with the result that a reduction in input-output data transfers is achieved with out requiring substantially larger memory space.
These and other features and advantages of the invention may be better understood by considering the detailed description that follows. In the course of this description, frequent reference will be made to the attached drawings.