Data warehouses typically contain two major types of data elements available for analysis: dimensions and measures. Each dimension is tied to a categorical attribute such as product, market, time, channel, scenario, customer, etc. Given a dimension, every item in a data set can be categorized according to its dimension. A dimension may be described as a categorical attribute or a categorical field. A measure represents a data field that is associated with particular dimension categories (i.e., dimension values) and that can be used for calculations such as summation and averaging. A measure may be described as a continuous target. For an example, the average amount of money customers spent in a given store can be calculated based on the amount of customer spending and the store dimension.
Data analysts today have to deal with increasingly large volumes of data. Attempting to find insights in large amounts of data (e.g., terabytes, petabytes, etc.), with many possible combinations between categorical attributes, is a difficult task. A common business scenario is identifying the relationship and influence of dimensions generated by categorical fields or categorical attributes on a continuous target. The goal for the data analyst is to determine which of the dimensions are relevant to the measure and among those that are relevant, discerning the magnitude of their impact. Ultimately, the goal is to produce a series of aggregated tabular reports that illustrate measure-dimension relationships.
The following is an example 2-dimensional table:
X2X112. . .S1(1, 1)(1, 2). . .(1, S)2(2, 1)(2, 2). . .(2, S)... .........R(R, 1)(R, 2). . .(R, S)
In the example 2-dimensional table, suppose dimension X1 has R categories (1, . . . , R) and dimension X2 has S categories (1, . . . , S). For a 2-dimensional table, the cells in the first column and the cells in the first row may be described as “dimension cells” for dimension X1 and dimension X2, respectively. A category may be described as a value or label of a dimension cell. On the other hand, the elements from these two dimensions (i.e., the remaining cells in the table) may be described as “table cells” and would contain statistics about the continuous target with two dimensions.
That is, dimension cells may be said to correspond to categories of the matching categorical attribute, while table cells may be said to correspond to combinations of categories from categorical attributes matching different dimensions.
It is from relationships between dimensions and measures that analysts derive insights into their businesses. The challenge is trying to navigate through what may possibly be thousands of reports, each representing a possible measure-dimension combination.
Exploring data to detect important dimensions is difficult and tedious. Even with existing tools, data analysts need to be skilled in statistical analysis and data mining. The volume of data exacerbates the problem even for the experts. Organizations have invested heavily in data acquisition and storage technologies, and the organizations understand the value of data and believe in the business analytic proposition. However, there is a shortage of individuals capable of defining, executing, and extracting valuable information from a statistical analysis.