The present invention relates to databases and in particular to data aggregation on demand.
In the field of data warehousing, members of an organization will frequently need to summarize, or aggregate, vast quantities of data at various different levels of summarization within various stored dimensions.
A level of data in a dimension of a database is a grouping of the entries in that dimension. For example, if a dimension consists of different stores, the stores could be grouped at a city level, a state level, or any other conceivable level.
Different people will frequently wish to obtain the same summarized data at different times. Rather than querying the data in the data warehouse and aggregating the data every time aggregated data is required, it has become common practice to pre-aggregate data at various commonly queried levels of aggregation and store the aggregated data in the database for easy retrieval in what is referred to as a partition. The levels which are pre-aggregated and stored in this fashion are often defined by the system administrator. When further aggregations are performed at the request of a user, these might also be stored in a partition to be retrieved at a later date.
A problem with such storage of partitioned data is that when a request is made for data at a certain level aggregation which has not already been generated, the aggregation must be carried out using detail level data, despite the fact that the data might actually be stored at a hierarchically marginally lower level, and easily be aggregated further to the level in question. Levels are hierarchically above or below one another if all the detail records associated with any particular member of the lower level are also associated with a single member of the higher level. For example, a city level grouping of stores is hierarchically below a state level grouping of stores, as all stores in a particular city will also be in a particular state (assuming no cities lie across a state border). However, a "store size" level will normally not be hierarchically above or below a state level, as there is no reason for there to be a correlation between store size and state.
To overcome the problem of always having to resort to the detail level data to perform a new aggregation, it has become common practice to define virtual partitions for intersections which do not actually store data at a particular level, but instead present the outward appearance of holding the data at a certain level of aggregation, and are provided with the functionality to aggregate data from one or more lower level physical partitions which are stored. The virtual partitions are pre-programmed as to which physical partition data to use for aggregation.
However, for systems with a large number of dimensions and a significant number of levels in each dimension, the number of virtual partitions which must be defined can be prohibitive--both from an administrative perspective and a search/query perspective. If a virtual partition is to be maintained for each possible set of levels across all the dimensions, the number of virtual partitions which require storing is the product of the number of levels in each dimension. For example, a system with 6 dimensions each with 5 levels and 2 dimensions each with 7 levels results in a matrix of 765,625 cells.
What is required is a system to allow the server handling queries to choose the best partition from which to aggregate data "on the fly" without maintaining a database of appropriate partitions to aggregate from.