1. Field of the Invention
The present invention relates to the field of computing. More particularly, the present invention relates to a method for interactively exploring a data cube for anomalous data.
2. Description of the Related Art
On-Line Analytical Processing (OLAP) is a computing technique for summarizing, consolidating, viewing, applying formulae to, and synthesizing data according to multiple dimensions. OLAP software enables users, such as analysts, managers and executives, to gain insight into performance of an enterprise through rapid access to a wide variety of data views that are organized to reflect the multidimensional nature of the enterprise performance data. An increasingly popular data model for OLAP applications is the multidimensional database (MDDB), which is also known as the data cube. OLAP data cubes are predominantly used for interactive exploration of performance data for finding regions of anomalies in the data, which are also referred to as exceptions or deviations. Problem areas and/or new opportunities are often identified when an anomaly is located.
To create an MDDB from a collection of data, a number of attributes associated with the data are selected. Some of the attributes are chosen to be metrics of interest and are each referred to as a "measure," while the remaining attributes are referred to as "dimensions." Dimensions usually have associated "hierarchies" that are arranged in aggregation levels providing different levels of granularity for viewing the data.
Exploration typically begins at the highest level of a dimensional hierarchy. The lower levels of hierarchies are then "drilled-down" to by looking at the aggregated values and visually identifying interesting values within the aggregated values. For example, drilling-down along a product dimension from a product category to a product type may identify product types exhibiting an anomalous sales behavior. Continued drill-down from the product type may identify individual products causing the anomalous sales behavior. If exploration along a particular path does not yield interesting results, the path is "rolled-up" and another branch is examined. A roll-up may return to an intermediate level for drilling-down along another branch, or the top-level of the hierarchy may be returned to and another drill-down may continue along another dimension.
Besides being cumbersome, this "hypothesis-driven" exploration for anomalies has several shortcomings. For example, the search space is usually large--a typical data cube has 5-8 dimensions with any particular dimension having hundreds of values, and each dimension having a hierarchy that is 3-8 levels high, as disclosed by George Colliat, OLAP, relational, and multidimensional database systems, Technical report, Arbor Software Corporation, Sunnyvale, Calif., 1995. Consequently, an anomaly can be hidden in any of several million values of detailed data that has been aggregated at various levels of detail. Additionally, higher level aggregations from where an analysis typically begins may not be affected by an anomaly occurring below the starting level because of cancellation effects caused by multiple exceptions or simply by the large amount of aggregated data. Even when data is viewed at the same level of detail as where an anomaly occurs, the exception might be hard to notice.
What is needed is a way for conveniently performing an exploration of a data cube that ensures that abnormal data patterns are not missed at any level of data aggregation, regardless of the number of dimensions and/or hierarchies of the data.