This application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
1. Field of Invention
This invention relates to visualizing multiple-dimension data.
2. Description of Related Art
With the decreasing cost of storage and increased bandwidth of networks, storing large volumes of fine grain data has become technically feasible and cost effective. In business environments, this fine grain data typically includes, for example, transactions, sales records, and/or customer information. This fine grain data is typically stored in warehouses or data marts. When properly analyzed, this fine grain data provides a rich analysis source for understanding customer behavior.
Transactions collected by operational systems are frequently stored in relational tables. For a variety of reasons, including data cleanliness, scalability, efficiency of the relational method, difficulty in building schemas, and computational complexity, analyzing, understanding, and making business decisions using raw relational tables is difficult. Unfortunately, the relational model and the standard interface of the structured query language (SQL) used to manipulate relational tables, as described in A Guide to the SQL Standard, C. J. Date et al, Addison-Wesley, Reading, Mass., 1997, are not well-suited for analysis tasks. When submitted against warehouses that have been engineered for fast transaction archiving, analysis queries frequently run extremely slowly. For example, a multi-million dollar warehouse may only be able to support one or two power analysis users.
One conventional approach for overcoming the analysis problem promulgated by business intelligence software vendors involves aggregating transactions into multi-dimensional databases, which are also referred to as xe2x80x9cdata cubesxe2x80x9d. This is described in xe2x80x9cExtending the database relational model to capture more meaning,xe2x80x9d E. F. Codd, Association for Computing Machinery, 1997 and OLAP Solutions, E. Thomsen, John Wiley and Sons, New York, 1997.
The data cube, that is, the raw data structure of a multi-dimensional database, organizes information along a sequence of categories. The categorizing variables are called dimensions. The data, called the measures, is stored in cells. One common use of data cubes is to store aggregated sales transactions. In this case, for example, the cube dimensions might be product, store, department, customer number, region, month, while the measures might be cost of goods sold (COGS), sales, and profit. The dimensions are predefined indices into a cell of the data cube.
The measures in a cell are roll-ups over the transactions. The roll-ups or aggregations of the transactions are usually sums of the transactions. However, the rollups or aggregations of the transactions may include other functions such as averages, standard deviations, percentages, etc. For example, the values for the dimensions may be north, south, east, and west for the region dimension; shoes and shirts for the product dimension; and January, February, . . . , December for the month dimension. Then, value in the data cube cell corresponding to xe2x80x9csales[north][shirts][Feb]xe2x80x9d is the total sales of shirts for the northern region for the month of February.
Dimensions frequently decompose hierarchically. For example, the hierarchical levels for the time dimension may be year, quarter, month, day, and hour. Standard implementations of multi-dimensional databases frequently support hierarchical navigation. Thus the value for the data cube cell xe2x80x9csales[north][shirts][First Quarter]xe2x80x9d is the sum of the data cube cells xe2x80x9csales[north][shirts][January]xe2x80x9d, xe2x80x9csales[north][shirts][February]xe2x80x9d and xe2x80x9csales[north][shirts][March]xe2x80x9d. Every cell does not need to be populated, nor do the hierarchies need to be symmetric for all dimensions.
The standard interface for understanding and manipulating data cubes is called a xe2x80x9cpivot tablexe2x80x9d or xe2x80x9ccross tabxe2x80x9d. Although there are variations among particular vendors"" implementations, FIG. 1 shows an example of a Microsoft Excel(trademark) pivot table. As shown in FIG. 1, in a Microsoft Excel(trademark) pivot table. As shown in FIG. 1, in a Microsoft Excel(trademark) pivot table 10, the cells are arranged in a xe2x80x9crow by column by pagexe2x80x9d grid, with only one page being displayed at any time. The value of the column dimension 20 (the xe2x80x9cproductxe2x80x9d dimension in FIG. 1), the row dimension 30 (the xe2x80x9cstatexe2x80x9d dimension in FIG. 1), and the page dimension 40 (the xe2x80x9cQTRxe2x80x9d dimension in FIG. 1) are used to index the table cells and adjust the displayed page. In the pivot table 10 shown in FIG. 1, each cell contains five measures: sales 50, expenses 51, profit 52, cost of goods sold (COGS) 53, and marketing 54. For each measure, the margin 55 are totaled along the edges. The grand totals are totaled in the lower right hand corner (not shown in FIG. 1). The row dimension 20, i.e., the product dimension, is organized into a two-level hierarchy that includes a higher-level xe2x80x9cproduct_typexe2x80x9d dimension 21 and the low-level xe2x80x9cproductxe2x80x9d dimension 22 nested within the xe2x80x9cproduct_typexe2x80x9d dimension 21.
Pivot tables are implemented as interactive textual reports, i.e., textual reports that can be manipulated. Standard pivot table manipulations include assigning dimensions to the rows, columns, and pages using menus, tool bars, and wizards; navigating hierarchical dimensions by collapsing or expanding the hierarchies; and aggregating results across different dimensions.
Trellis displays, as disclosed in xe2x80x9cThe design and control of trellis displayxe2x80x9d, W. S. Cleveland et al., Journal of Computational and Statistical Graphics, 5:123-155, 1996, extends the small multiples disclosed in The Visual Display of Quantitative Information, E. R. Tufte, Graphics Press, Cheshire, Conn., 1983, by arranging sequences of tiled panels arranged in rows and columns. In some cases, the tiles may span multiple pages. When applied to show slices of multi-dimensional data with the dimensions controlling the row, column, and page in the layout, Trellis displays are particularly useful for discovering interactions among the measures. However, using Trellis displays to study cubes, while promising, frequently results in tens to hundreds of pages of printed graphs that must be manually studied.
Unfortunately, pivot table reports are hard to understand, particularly for large tables. For example, after carefully reviewing the pivot table shown in FIG. 1, even if the font were readable, about the only thing that could be determined is that certain products are not sold in certain states. Users usually find it difficult, if not impossible, to see patterns, discover trends, identify what changed from quarter to quarter, and/or find relationships between various measures. Seemingly simple analysis tasks, such as identifying the three largest cells, locating the two rows with the smallest totals, or finding the biggest growth trends, are time consuming and tedious.
Accordingly, various exemplary embodiments of the multi-dimensional data visualization systems and methods according to this invention provide a number of tools for visual discovery in multi-dimensional data structures such as pivot tables. Each of the tools of the multi-dimensional data visualization systems and methods according to this invention target different aspects of visual analysis and discovery.
In various exemplary embodiments of the multi-dimensional data visualization systems and methods according to this invention, these tools are organized into a number of different perspectives. A perspective is a set of linked visual components, or xe2x80x9cviewsxe2x80x9d, that are displayed together on the same graphical user interface screen. The views in a perspective work together to enable a particular type of visual analysis. One such perspective focuses on visualizing a single measure over an entire pivot table or other multidimensional data structure. Another perspective shows two measures simultaneously.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of various exemplary embodiments of the systems and methods according to this invention.