When reporting data from relational databases, there are commonly cases wherein the data, to be useful as a result, is grouped and/or aggregated by a user or application. For example, when reporting on sales for a company, a mere list of every transaction does little good to the CEO of the company when making business decisions. Instead, the CEO requires a higher level view of the transaction data than such detail data at the lowest level of the data hierarchy. By grouping and aggregating the data into meaningful categories, e.g., total sales broken down by product and/or region, the transaction data is transformed into such higher level information based upon which appropriate higher level action can be taken.
Certain systems, such as relational databases, allow flexible, but often very complex approaches to writing queries that can compute aggregations on top of detail data. On the other hand, systems, such as On Line Analytical Processing (OLAP) systems, have their strengths in pre-computed or custom aggregations. In OLAP systems, the queries often explicitly specify which data is shown in the result set (often allowing aggregate data of varying hierarchy granularity levels). For example, a typical query for analyzing sales/order data might not follow traditional aggregation levels (such as continent, country, and region), but rather, it might match organizational structures (such as European sales division, Canadian sales division, US NorthWest sales division, etc.), which is an example set forth in more detail below.
FIG. 1, for instance, depicts exemplary non-limiting query instructions 100, e.g., multidimensional query language (MDX) query instructions, for producing a view on a data set. In this example, query code 100 is primarily a SELECT statement, which returns a result set of records from one or more tables, i.e., zero or more rows from one or more tables in a database. For some additional background, in specifying a SELECT query, the user specifies a description of the desired result set, but they do not specify what physical operations must be executed to produce that result set. Translating the query into an optimal “query plan” is left to the database system, more specifically, to a query optimizer component.
FIG. 2 illustrates an exemplary table construct 200 reflecting the query results from execution of the query 100. Table 200 shows that the query 100 implicates aggregation of underlying order transaction data at three different aggregation levels: Territory Group 210, Country 220 and Region 230. Territory Group 210 is broken down into at least a European territory group STG1, a Pacific territory group STG2 and a North American territory group STG3. Sales territory countries 220 are broken down into at least a Canadian territory country STC1 and United States territory country STC2. Sales territory regions 230 are broken down for the United States sales territory country STC2 as Northeast sales territory region STR1, Northwest sales territory region STR2, Southeast sales territory region STR3 and Southwest sales territory region STR4. Where a country 220 or region 230 does not apply to a territory group 210, as shown, the null value is recorded in table 200.
FIG. 2 includes a fourth column, order count 240, which stores the aggregation or subtotal of the number of orders that have been received from a particular group, country and/or region. As mentioned above, since Europe and the Pacific are supersets of countries and regions, the first two rows of table 200 having order counts OC1 and OC2, respectively, illustrate a first level of aggregation performed on the underlying data according to the organizational view that all sales from Europe and the Pacific are to be treated as single views. Also, in this organizational view, for purposes of order count OC3, Canada is treated as an entire country STC1 within a territory group 210, but which has no regions 230, and thus represents a second level of aggregation performed on the underlying data. Furthermore, as a third level of aggregation applied to the underlying data, the United States territory country STC2 associated with North America territory group STG3 is further broken down into subregions 230 as Northeast STR1, Northwest STR2, Southeast STR3 and Southwest STR4 having aggregate counts OC4, OC5, OC6 and OC7, respectively.
In this regard, as shown in FIG. 3, the desired output or report 300 that is displayed to a user via an application or the like for this query includes a sales region dimension 310 and an associated order count 320. For each sales region, the order count 320 is displayed so that the CEO can maintain these aggregate views of the data at the three different levels described above. Today some systems that access databases enable a user or application to designate the output of an aggregation operation, or subtotal, of data as “aggregate” data, in contrast to the “detail” data upon which operations were performed to form the “aggregate” data. In the above example, for instance, exemplary detail data would be each order individually listed as a record, including all of the data recorded for each order (shipping, billing, location, etc.). In contrast to such detail data, the aggregate data that resulted from executing the query 100, i.e., order counts OC1, OC2, OC3, OC4, OC5, OC6, OC7 of FIG. 2, while having particular meaning to the organization as a whole, has a different meaning as aggregate data in database terms. Put simply, order counts 240, once designated as aggregate data, are treated by the database separately from the detail data from which the aggregate data sprung forth.
The reason for this separate treatment of aggregate data and detail data in some query languages can be explained with reference to FIG. 4. As shown, table 400 includes nine rows r1, r2, r3, r4, r5, r6, r7, r8, r9 with columns sales country 410, sales region 420 and order count 430. Each order count 430, as shown, is explicitly tagged as either Detail data or Aggregate data per metadata or tag 440 associated with the data values. One of skill in the art can appreciate tag or metadata 440 could also be implemented as a column of the database. With Aggregate or Detail tags 440, when a query command is executed over the data in table 400 that, for instance, sums all of the orders over all of the data, the Aggregate data values are ignored in the calculation when adding up the values of rows r1, r2, r3, r4, r6 and r7, i.e., the values of rows r5, r8 and r9 are ignored in the calculation. To implicate an Aggregate value, the command language must explicitly request or calculate such a value distinctly from operations on the Detail data. This way, the sum of all orders will ignore rows r5, r8 and r9 when adding up the detail data of rows r1, r2, r3, r4, r6 and r7 to reach a total order count of 50. If it were not known to ignore the aggregate order counts of rows r5, r8 and r9, such order counts (highlighted by the bold bordering of the cells) would be included in the summing operation, yielding an incorrect result of 150.
As mentioned, the aggregate order counts of rows r5, r8 and r9 must be explicitly requested or calculated afresh as part of query execution, or else the order counts of rows r5, r8 and r9 will be skipped when the query is executed. However, for any of a variety of reasons, sometimes a user or application may nonetheless wish to re-categorize aggregate data as detail data, e.g., as the needs of an organization change.
For instance, as shown in FIG. 5, suppose a table 500 is desired which only includes rows r1, r2, r3, r4 and r8 of FIG. 4, which might happen to the organization if the East and West sides of the Canadian operation are merged under a single umbrella. Regardless of the reason, table 500 of FIG. 5 supposes a better view of the data; however, aggregating over all the detail data in table 500 yields the incorrect total of 30. In this case, the order count of 20 of r8 is ignored as aggregate data. Yet, presently, there is no way to re-categorize the aggregate data as detail data so that it is properly tallied in an aggregation query.
Accordingly, what is desired is a method by which already aggregated data can be easily re-categorized as detail data, or merged with detail data to operate on the aggregate data as detail data. What is further desired is a method by which aggregate data can be automatically re-categorized as detail data, or merged with detail data, where it is appropriate based on the usage of the data.