Business data is typically stored within physical tables of a database. The database may comprise a relational database such as SAP MaxDB, Oracle, Microsoft SQL Server, IBM DB2, Teradata and the like. Alternatively, the database could be a multi-dimensional database, an eXtendable Markup Language document, or any other structured data storage system. The physical tables may be distributed among several relational, dimensional, and/or other data sources.
The structures of and relationships between the physical database tables are complex. A typical end user is therefore unable to locate or extract desired information from the physical database tables. Business Intelligence (BI) tools (e.g., BusinessObjects Information Designer®) may therefore be used to build an abstraction layer that shields end users from the complexity of the physical tables. More specifically, the abstraction layer allows the end users to query a database using intuitive terms rather than references to specific physical entities of the database.
Commonly-assigned and co-pending U.S. patent application Ser. No. 12/463,702 describes such an abstraction layer, referred to therein as a semantic layer. Briefly, the “business objects” of an abstraction layer represent business entities, such as customers, products, stores, time, sales figures, etc., represented in the data of a data source. Business objects may be classified as dimension objects (i.e., to represent dimensions along which one may want to perform an analysis or report), detail objects (e.g., to represent additional information on dimensions), and measure objects (e.g., to represent indicators, most often numeric, whose value can be determined for a given combination of dimension values). In one example, a Sales measure object may be used to determine the total sales for January (i.e., a value of the Month dimension object) in France (i.e., a value of the Country dimension object).
Within a business, the same data (e.g., revenue) may be tracked by different departments at different levels of granularity. For example, at the corporate level, revenue may be recorded for each line of business, while each line of business may record revenue at the product level. Due to data entry errors, incomplete data, etc., the total revenue recorded by a particular line of business (i.e., the aggregation of revenues associated with each product of the line of business) may not equal the revenue recorded for that line of business at the corporate level.
FIG. 1 displays portions of physical database tables 110, 120 and 130 which illustrate the foregoing scenario. Tables 110, 120 and 130 may be contained in a data warehouse associated with a particular business. According to the present example, table 110 may be considered sales data compiled at the corporate level. Each row of table 110 associates a month and a company with sales figures for that company during that month.
Table 120 includes sales data compiled at the activity level. Each row of table 120 associates a month, a company and an activity with sales figures for that month, company and activity. Lastly, it may be assumed that table 130 includes sales data compiled by a particular company at the product level. The data of tables 110, 120 and 130 may appear partially redundant, however, since these tables are fed independently, several inconsistencies are exposed therein.
For example, table 110 shows total sales of 10000 for the company “Acco” in January 2008 (i.e., 01/08). In contrast, table 120 shows total sales of 9490 (i.e., 5000+4490) for the same company and period. Moreover, table 120 shows sales of 5000 during January 2008 for Acco and the activity “metal”, while table 130 shows sales of 5020 (i.e., 2000+2000+1020) for the same period, company and activity.
FIG. 2 illustrates database table 200 including the data of tables 110, 120 and 130. Database table 200 therefore shows sales data compiled at different levels of granularity. Database table 200 will be used to demonstrate that, if sales data is stored at different levels of granularity, the desired value of a Sales measure usually cannot be determined by aggregating the stored sales data at its most granular level.
More specifically, a case is considered in which a Sales measure of an abstraction layer is bound to column 202 of FIG. 2, and dimensions Month, Company, Activity and Product are bound to columns 204, 206, 208 and 209, respectively. The query {Dim: 1/08, Dim: Acco, Meas: Sales} will provide the result 24,510, which is the sum of column 202 for each row of table 200 which includes the month 1/08 and the company Acco. This result is virtually meaningless from a business standpoint.
Conventional systems attempt to address the foregoing by creating a different measure for each level of granularity of a single measure (e.g., Sales). FIG. 3 illustrates data model 300 of a conventional Online Analytic Processing (OLAP) system including such different measures.
Data model 300 includes CorporateSales measure 310, SalesbyActivity measure 320, and SalesbyActivityandProduct measure 330. Each measure is independent from the others and individually declares the dimensions that govern it. CorporateSales measure 310 is governed only by Month dimension 340 and Company dimension 350, SalesbyActivity measure 320 is governed only by Month dimension 340, Company dimension 350 and Activity dimension 360, and SalesbyActivityandProduct measure 330 is governed only by Month dimension 340, Company dimension 350, Activity dimension 360 and Product dimension 370.
With reference to FIG. 1, CorporateSales measure 310 may be bound to column 112 of table 110, SalesbyActivity measure 320 may be bound to column 122 of table 120 and SalesbyActivityandProduct measure 330 may be bound to column 132 of table 130. The sales data collected at each level of granularity may therefore be accessed independently via the three measures CorporateSales, SalesbyActivity, and SalesbyActivityandProduct.
The measures of data model 300 are considered orthogonal to one another. In other words, data model 300 does not provide a data consumer with any indication of relationships between the measures which may actually exist in the underlying physical tables. Consequently, data model 300 does not provide any mechanism for semantically drilling down or drilling up among the multiple analysis levels represented by the measures. The multiple analysis levels may introduce data discrepancies due to simplification or errors. The existence and nature of the differences between the analysis levels would assist a data consumer in evaluating the reliability of data which is retrieved through each analysis level.
What is needed is an efficient system to represent a same measure which is tracked at different levels of detail and to facilitate navigation between the levels. Such a system may reduce a need to maintain multiple unrelated representations of a same measure, and may provide more meaningful evaluation of stored data and of the discrepancies reflected therein.