Imposing meaningful hierarchical structure on large databases provides levels of abstraction that can be leveraged by both the computer and the analyst. These hierarchies can come from several different sources. Some hierarchies are provided by the inherent nature of the database. Data mining algorithms, such as decision trees and clustering techniques that classify the data and thereby automatically derive hierarchies can be used to determine database hierarchy. Part of the analysis task when dealing with automatically generated hierarchies is an understanding and trusting the results. See, for example, 2001, Thearling et al., “Visualizing Data Mining Models” in Information Visualization in Data Mining and Knowledge Discovery, Fayyad, Grinstein and Wierse eds., Morgan Kaufinan, which is hereby incorporated by reference in its entirety.
FIG. 1 illustrates the hierarchy for a time dimension. Within the time dimension, there are four levels: “All”, “year”, “quarter”, and “month”. Simple hierarchies, like the one shown in FIG. 1, are commonly modeled using a star schema. The entire dimension hierarchy is represented by a single dimension table joined to the base fact table. In this type of hierarchy, there is only one path of aggregation. However, there are more complex dimension hierarchies where the aggregation path can branch. For example, a time dimension might aggregate from “day” to both “week” and “month.”
To provide another illustration of the concept of a star schema, consider the case in which one wishes to analyze monthly total product sales for a department store by breaking down the data by region and store. Raw data can come in the form of product managers' (FIG. 2) and regional managers' (FIG. 3) quarterly sales reports. Once the data has been collected and refined, it can reside in a large base table. In addition, there may be adjunct lookup tables. A star schema for this base data is shown in FIG. 4. The table schema of FIG. 4 is called a star schema because the central fact table is depicted as surrounded by each of the dimension tables that describe each dimension. In this example, the base sales data table is the fact table and each lookup table is a dimension table.
The stores, weeks, and products columns in the fact table in FIG. 4 contain numeric values. Fact tables can grow to huge numbers of rows. The lookup tables contain hierarchy information relating each store, week, and product with its higher-level aggregations. For example store 1 in the base table of FIG. 4 connects with the “Store Lookup” table where it has the name Ridgewood and rolls up to the Northeast region. Product 2 in the base table connects with the “Product Lookup” table where it has the name olive oil soap and rolls up into the product type soap in the skin care products group. Thomsen, 1997, OLAP Solutions: Building Multidimensional Information Systems, Wiley Computer Publishing, New York, which is hereby incorporated by reference in its entirety.
The most common schemata found in databases are the star schema and snowflake schema. Each schema has a fact table containing data items of interest (measures) in the analysis for which the database is built. These data items might be transaction amounts such as the amount invested in a mutual fund or the profit on a sales transaction. The fact table is surrounded by dimension tables containing detailed information used to summarize the fact table in different ways. An illustration of a star schema has been provided (FIG. 4). FIG. 5 illustrates a snowflake schema that includes hierarchy. The snowflake and star schema provide a conceptual multidimensional view of the database. The database is a core set of measures characterized by a number of dimensions rather than a set of interrelated tables. This organization correlates directly with the typical analysis query that summarizes a few quantitative attributes (or measures) such as profit or sales by several characterizing attributes (or dimensions) such as product, location, or date over a large number of tuples. The primary differences between the star and snowflake schema arise in how they model hierarchical structures on the dimensions.
When referring to values within a dimension hierarchy, a dotted notation can be used to specify a specific path from the root level “All” (FIG. 1) of the hierarchy down to the specified value. Specifically, to refer to a value on level m of a hierarchy, the dimension name is first optionally listed, then zero or more of the (m−1) intermediate ancestor values, and then finally the value on the mth level, all separated by periods. For example, the Jan node on the Month level in the time hierarchy that corresponds to January, 1998, can be referred to as 1998.Qtr1.Jan. When this notation is used, the reference is called a qualified value. When a value is simply described by its node value (without any path to the root node) the reference is called an unqualified value.
One form of database is a relational warehouse, such as a structured query language (SQL) database. Relational warehouses organize data into tables. Each row in a table corresponds to a basic entity or fact and each column represents a property of that entity. See, for example, Thomsen, 1997, OLAP Solutions: Building Multidimensional Information Systems, Wiley Computer Publishing, New York. For example, a table may represent transactions in a bank, where each row corresponds to a single transaction. As such, each transaction can have multiple properties, such as the transaction amount, the account balance, the bank branch, and the customer. As used herein, a row in a table is referred to as a tuple or record, and a column in the relation is referred to as a field. Such tables are also referred to as relations. As such, a relation is defined as a database table that contains a set of tuples.
It is possible to create dimension tables and star schemas in relational warehouses. A single relational warehouse will contain many heterogeneous but interrelated tables. The fields (columns) within a table can be partitioned into two types: dimensions and measures. Dimensions and measures are similar to independent and dependent variables in traditional analysis. For example, the bank branch and the customer are dimensions, while the account balance is a measure.
Business intelligence requires tools for interactive visualization of multi-dimensional databases. The prior art provides a number of approaches to visualizing such information. For example, FIG. 17 illustrates a screen shot from a Microsoft Analysis Services interface to a multi-dimensional database that includes a Time dimension having the levels year and quarter. In another example, FIG. 18 illustrates a Microsoft Excel Pivot-Table interface to a multi-dimensional database. Although the software programs illustrated in FIGS. 17 and 18 are useful, they are often unsatisfactory. Such interfaces restrict the construction of the table so that levels from a single dimension must appear on the same axis (e.g., the rows or columns) and must be in their natural hierarchical order. Thus, in FIG. 18, if a user attempts to move Year to the rows and leave Quarter on the columns, the interface will move all levels (Years as well as Quarters) to the rows. Similarly, it is not possible to skip levels in the defined hierarchies. For example, if Year and Month were to appear on the axis then Quarter must also appear is the hierarchical source of the data includes such a level. These limitations are not just cosmetic constraints. They limit the types of analysis that can be performed on the data using such graphical interfaces.
Based on the above background, what is needed in the art are improved methods and graphical interfaces for visualizing data that includes dimensions and measures.