Current data warehouse implementations make use of the “group by” functionality offered by standard relational databases in order to simplify computations necessary to generate results requested in a query. However, for some types of data requests, conventional solutions do not represent the requests in a form that facilitates fast and efficient calculation of the requested results.
In particular, for complex multiqueries, data warehouse applications must perform complex processing to represent the queries in a form that a data repository can handle. Moreover, the warehouse applications must subsequently perform further processing of the data repository results to complete the calculation of the requested results to be presented to the user. Such an arrangement has two major drawbacks: users often find the average response times for these queries unsatisfactory, and in worst-case scenarios, answer times for providing results requested in the multiquery may not be defined.
Such disadvantages are illustrated in the following example:
Compare the revenues of (requested results):
Product P1 in the first quarter of 2004
All products in the first quarter of 2004
Product P1 in 2004
Queries:
Select product=P1 AND calday between 20040101 and 20040331, group by quarter
Select calday between 20040101 and 20040331, group by quarter
Select product=P1 AND calday between 20040101 and 20041231, group by year
PART multiquery:
part0: calday between 20040101 and 20041231 (global restriction)
part1: product=P1 and calday between 20040101 and 20040331
part2: calday between 20040101 and 20040331
part3: product=P1
Conventional database query:
calday between 20040101 and 20041231 AND                ((product=P1 AND calday between 20040101 and 20040331) OR                    (calday between 20040101 and 20040331) OR            (product=P1))                        group by product, quarter        
In response to the multiquery, the data repository returns an intermediate result set that may be quite large and require extensive further processing in the application to identify data responsive to the requests and to return the requested results to the user. In some cases, the further processing may substantially reduce the size of the intermediate results set which may contain data of an excessively fine granularity as compared to the requested results. This reduction suggests that the database returned more results than necessary, causing excessive network traffic and delaying presentation of the requested results.