The present invention relates generally to the field of selection of data warehouses, and more specifically, to slice and dice operations in data warehouses.
Enterprises are building increasingly large information warehouses to enable advanced information analytics and to improve “business values” of information. A data warehouse is also called a data cube. A data cube is queried using online analytical processing (OLAP) operations, in which slice and dice are two fundamental ones. In particular, the slice operation performs a selection on one dimension of the given cube, and the dice operation performs a selection on two or more dimensions.
In practice, a data cube may have dimensions with many-to-many relationships to the facts. For instance a data cube may include facts such as, one patent (article), which may have multiple inventors (authors), and one inventor (author) may file (write) multiple patents (articles). As a result, a patent data cube has to deal with queries on many-to-many relationships.
As an example, a book cube may have two dimensions: an author dimension and a category dimension. One book can have multiple authors and belong to multiple categories.
If a book has multiple authors and categories, the slice and dice operations can be more complicated. For instance, a user may want to find the total sales for each book coauthored by “Mike” and “John”. The existing scalar-level slice and dice operations cannot support such set-level query semantics. Further, users may have even more complex query semantics such as finding the total sales for each book ONLY authored by “Mike” and “John”, finding the total sales for each book that belongs to category “statistics”, but not “finance”.
The existing warehousing known art does not adequately discuss set-level slice and dice operations, and how to support these operations efficiently. Some of the prior art has focused on improving the performance across a large set of queries by reusing the result or by materializing some intermediate results. Other known art has focused on developing warehouse-specific optimization algorithms for standard aggregation queries.
Hence, there is a need for a more efficient system and method for supporting set-level slice and dice in data warehouses.