1. Field of the Invention
The present invention relates generally to identifying and explaining exceptions in data to aid in data analysis.
2. Description of the Related Art
Multidimensional databases, particularly those having online analytical processing (OLAP), are used to analyze historical data for decision support. Multidimensional databases consist of two kinds of attributes: measures (numerical values such as sales volume) and dimensions (e.g., product name, store, time period, etc.). Each dimension in turn can include a hierarchy. For example, a hierarchy on the xe2x80x9cproductxe2x80x9d dimension might be xe2x80x9cspecific product codexe2x80x9d, then xe2x80x9cproduct typexe2x80x9d, then xe2x80x9cproduct categoryxe2x80x9d.
It happens that to analyze data in a multidimensional database, an analyst might want to identify anomalies (referred to herein as xe2x80x9cexceptionsxe2x80x9d) in the data, because such exceptions can provide interesting information. As a simple example, sales of a particular product, such as oysters, might peak in a particular region at a particular time of year. Identifying such an exception can provide the analyst with useful insights on how to improve marketing, product display, and so on.
In OLAP databases, an analyst can interactively explore a multidimensional database to find regions of data exceptions using simple navigational operations, including drill-dowvn (to show more detail), roll-up (to show less detail), pivot (to change axis dimensions), and so on. In any case, the procedure is manualxe2x80x94an analyst might have to experiment, using trial and error, with many possible data displays, before finding interesting exceptions. The tediousness of this can be better appreciated with the understanding that a multidimensional database might include up to eight dimensions, each with a hierarchy of eight levels, with each level including perhaps hundreds of member data elements any one of which might be an exceptional element.
To address this shortcoming, the present inventors have provided a way to identify exceptions in [preferably xe2x80x9cU.S. Pat. No. 6,094,651, issued Jul. 25, 2000xe2x80x9d] [or less preferably xe2x80x9cSarawagi et al., xe2x80x9cDiscovery-Driven Exploration of OLAP Data Cubesxe2x80x9d, Proc. of the 6th Int""l Conf. on Extending Database Technology (EDBT), 1998xe2x80x9d], incorporated herein by reference. As disclosed therein, a model uses an equation that takes into account the effects and interactions of multiple dimensions and nestings of hierarchies to compute an anticipated value of a data element in the context of its position in the database, and the model combines trends along different dimensions to which the data element belongs to determine whether the element is exceptional.
As recognized by the present invention, it can be important to explain why an element is indicated as being exceptional so that the analyst more fully understands the presentation, particularly when more than two dimensions are being analyzed. More specifically, an analyst might be viewing, in a display showing first and second dimensions, a data element duly marked as being exceptional because of an anomalous value in a third dimension. The analyst would know that the data element is exceptional but because the dimensions being viewed are not those along which the anomalous value occurs, it would not be apparent to the analyst why the element is exceptional. As further understood herein, explaining the reason a data element is exceptional is not trivial because, as intimated above, the preferred model that is used for finding the exceptions in the first place involves one monolithic equation that combines the effects of multiple levels of data element aggregations along different dimensions. Fortunately, the present invention has considered the above-noted consequence and has provided the solution disclosed herein.
A computer is disclosed that is programmed to identify at least two dimensions for display of data exceptions therein. The logic embodies a method that includes providing an exception equation, and identifying at least two maximal terms in the equation. The method embodied by the logic also includes identifying at least two dimensions along which the maximal terms are aggregated, and displaying the data exceptions using the two dimensions.
In a preferred implementation, the exception equation identifies at least one exceptional element characterized by an actual residual that in turn is characterized by a difference from an anticipated value, with the difference defining a first direction. With this in mind, the logic includes generating a simple version of the exception equation using only the maximal terms.
As set forth in detail below, the logic further identifies sets of candidate maximal terms. Also, for each set of candidate maximal terms, the logic determines Whether the terms maintain an exceptional status of the exceptional element in the first direction, when the terms are used in the simple version of the exception equation. Moreover, the act of identifying at least two maximal terms can include, for each set of candidate maximal terms, determining a simple residual when the candidate maximal terms are used in the simple version of the exception equation. The selected set of maximal terms is identified as being the set of candidate maximal terms that both maintains an exceptional status of the exceptional element in the first direction when the terms are used in the simple version of the exception equation, and that results in a simple residual closest to the actual residual.
As intended herein, the selected set of maximal terms includes only two candidate maximal terms if the two maximal terms maintain an exceptional status of the exceptional element in the first direction when the terms are used in the simple version of the exception equation. Otherwise, the selected set can include more than two terms.
In another aspect, a general purpose computer includes logic that undertakes method acts for explaining exceptions in data having at least two dimensions. These method acts include providing an exception equation, and explaining exceptions in the data based on maximal coefficients in the equation. A computer program device embodying the present logic is also disclosed.
In still another aspect, a computer programmed with logic for identifying at least two dimensions for display of data exceptions therein undertakes a method that includes providing an exception equation for identifying at least one exceptional element in a first dimension in a database having multiple dimensions. The element is characterized by at least one residual that in turn is characterized by a difference from an anticipated value. At least two dimensions are ranked based on the difference between a residual of the exceptional element in each dimension and other residuals in that dimension. The exceptional element is presented in accordance with the ranking act.
In yet another aspect, a computer is programmed with logic for identifying at least two dimensions for display of data exceptions therein. An exception equation is provided for identifying at least one exceptional element in a first dimension in a database having multiple dimensions, with the element being characterized by at least one residual and with the residual being characterized by a difference from an anticipated value. The residual defines a sign. The computer then determines at least one largest magnitude coefficient in the equation having a sign opposite to the sign of the residual. The exceptional element is presented using the dimension associated with the coefficient in accordance with the determining act.