Business decision makers often use business intelligence analytical software to pose operational performance questions as queries against their data sources. The basic capabilities of querying and reporting functions is extended by On-line Analytical Processing (OLAP). OLAP provides a number of key benefits that enable users to make more efficient managerial and strategic decisions by providing a robust multidimensional understanding of the data from a variety of perspectives and hierarchies in a multidimensional database. Business decision makers who require access to large amounts of data in order to make their business decisions are able to use OLAP to manipulate data quickly and effectively.
Exemplary analytical and navigational activities provided by OLAP include:                calculations and modeling applied across dimensions, through hierarchies and/or across members;        trend analysis;        slicing subsets;        drill-down to deeper dimensional levels of consolidation;        drill-through to other detail data; and        pivot to new dimensional comparisons.        
Other OLAP functionality that provides insights into business growth, spending, and sales patterns includes operations for ranking, moving averages, growth rates, statistical analysis, and “what if” scenarios.
Multidimensional databases intuitively view data as a multidimensional structural metaphor called a cube whose cells correspond to events that occurred in the business domain. Each event is quantified by a set of measures; each edge of the cube corresponds to a relevant dimension for analysis, typically associated to a hierarchy of attributes that further describe it. A multidimensional database may further comprise a collection of related cubes. Dimensions, such as an essential and distinguishing concept in multidimensional databases, are used for selecting and aggregating data at the desired level of detail.
FIG. 1(a) illustrates the conceptual structure of a multidimensional database 100. A dimension 102, 104, or 106 is a structural attribute that is a list of members, all of which are of a similar type in the user's perception of the data. For example, the year 1997 108 and all quarters, Q1 110, Q2 112, Q3 114, and Q4 116, are members of the Time dimension 102; the Outdoor 118, Environment 120 and Sport 122 are members of the Product dimension 104; and Revenue 124, Cost 126 and Profit 128 are members of the Measures dimension 106. Moreover, each dimension 102, 104, or 106 is considered a member of the multidimensional database 100.
FIG. 1(b) illustrates the logical structure of a multidimensional database 130 arranged as a multidimensional array, every data item in the multidimensional database 130 is located and accessed based on the intersection of the members of the dimensions 102, 104 and 106. The array comprises a group of data cells arranged by the dimensions 102, 104 and 106 of the data.
A dimension acts as an index for identifying values within the cube. If one member of the dimension is selected, then the remaining dimensions in which a range of members, or all members are selected defining a sub-cube in which the number of dimensions is reduced by one. If all but two dimensions have a single member selected, the remaining two dimensions define a slice or a page. If all dimensions have a single member selected, then a single cell is defined. Dimensions offer a very concise, intuitive way of organizing and selecting data for retrieval, exploration and analysis.
In the multidimensional database example 130 shown as a cube in FIG. 1(b), the dimensions are Time 102, Product 104, and Measures 106. The cube is three dimensional, with each dimension represented by an edge axis of the cube. The intersection of the dimension members are represented by cells in the multidimensional database that specify a precise intersection along all dimensions that uniquely identifies a single data point. For example, the intersection of Q4 116, Revenue 124 and Environmental 120 contains the value, 132, representing the revenue for environmental products in the fourth quarter of 1997.
Cubes generally have hierarchies or formula-based relationships of data within each dimension. Consolidation involves computing all of these data relationships for one or more dimensions. An example of consolidation is adding up all revenues in the first quarter. While such relationships are normally summations, any type of computational relationship or formula might be defined. In fact, there is no strict requirement to even have a relationship defined.
Members of a dimension are included in a calculation to produce a consolidated total for a parent member. Children may themselves be consolidated levels, which require that they in turn have children. A member may be a child for more than one parent, and a child's multiple parents may not necessarily be at the same hierarchical dimensional level, thereby allowing complex, multiple hierarchical aggregations within any dimension.
Drill-down (to show more detail), roll-up (to show less detail), pivot (to change axis dimensions) are currently available analytical techniques whereby the business decision maker navigates among dimensional levels of data ranging from the summarized to the detailed. The drilling paths may be defined by the hierarchies within dimensions or other relationships that may be dynamic within or between dimensions. For example, when viewing data for Revenue 124 for the year 1997 108 in FIG. 1(a), a drill-down operation in the Time dimension 102 would then display members Q1 110, Q2 112, Q3 114, and Q4 116.
Current business intelligence analytical software requires the business decision makers to explore OLAP cubes on their own. The exploration of data may be facilitated by a cross tabulation on a user interface.
When data value of interest, for example, data which is outside a predictable pattern or a typical range, has been discovered, the main course of action is to drill-down into more details to get a breakdown of how a value is constituted based on lower-level members in a multidimensional hierarchy. If there are no lower-level members to drill to, the business decision makers may have the option to “drill-through” to an alternate exploration instance that might provide more detail about how the data value of interest resulted from its constituent parts.
Drilling down to more details may provide some insight into which constituent members are contributing to the data value of interest, but may not pinpoint the causality in data values. Major influencers for the data value of interest could be on a specific data value in the context of a multidimensional cube, outside the drill path, i.e. not part of the dimensions making up the cross-tab, therefore the question of “why” is not answered completely.
In addition, many of the cells in a cube are interconnected by formulas. Cells representing profit, for example, are calculated by the difference between corresponding cells representing revenue and corresponding cells representing costs. Cells representing a year are computed as the sum of corresponding cells representing quarters. The drill-down and drill-through tasks typically require user experience. A user may have to experiment, using trial and error, with many possible data displays, before finding interesting exceptions. Therefore, the results of these data explorations may not be easily reproducible.
To explore manually all the data values in the context of a multidimensional cube outside the drill path is not practical. A multidimensional database may include many dimensions, each with a hierarchy of many dimensional levels, with each dimensional level including hundreds of member data elements, any one of them may be data with special interest.
There have been different approaches to identify the data with special values in a multidimensional database.
U.S. Pat. No. 7,065,534 uses curve fitting data techniques to provide detection of data anomalies in a “data tube” from a data perspective, if data substantially deviates from a predicted value established by a curve fitting process such as a linear function applied to the data tube. A threshold value can also be employed to facilitate in determining a degree of deviation necessary before a data value is considered anomalous.
U.S. Pat. No. 6,094,651 teaches a method for locating data anomalies in a dimensional data cube that includes the steps of associating a surprise value with each cell of a data cube, and indicating a data anomaly when the surprise value associated with a cell exceeds a predetermined exception threshold. The surprise value associated with each cell is a composite value that is based on at least one of a Self-Exp value for the cell, an In-Exp value for the cell and a Path-Exp value for the cell. This method is limited to the surprise value for the cells within the multidimensional cube in question and does not attempt to determine the levels indirectly involved in the contribution to the data value of interest.
U.S. Pat. No. 6,654,754 describes a method for interpreting, explaining, and manipulating selected exceptions in one or more dimensions in a multidimensional data by qualifying individual contributions for each dimension. A density threshold preset rule is assigned to each dimension. The density correction factor of the contribution is determined, before a density corrected contribution of each contribution based on the density correction factor and the density threshold preset rule is determined. The density-corrected contributions are normalized. Each dimension is then sorted according to the normalized, density corrected contributions associated with that dimension. This method of sorting dimensions is also limited to the dimensions within the multidimensional cube in question.
There is therefore a need to discover the causality of the discovered business intelligence beyond the proximity of the data value of interest.
There is a need to discover the causality of the discovered business intelligence outside the context of the multidimensional data being navigated.
There is further a need to discover the causality of the business intelligence with predictability and reproducibility, irrespective of the business decision maker's experience with the business intelligence application.