The present invention relates to relational database management systems and, more specifically, to techniques for storing multidimensional data in relational database management systems.
In the context of database systems, a xe2x80x9cdimensionxe2x80x9d is a list of values that provide categories for data. A dimension acts as an index for identifying values of a variable. For example, if sales data has a separate sales figure for each month, then the data has a MONTH dimension. That is, the data is organized by month. A dimension is similar to a key in a relational database. Data that is organized by two or more dimensions is referred to as xe2x80x9cmultidimensional dataxe2x80x9d.
Any item of data within a multidimensional variable can be uniquely and completely selected by specifying one member from each of the variable""s dimensions. For example, if a sales variable is dimensioned by MONTH, PRODUCT, and MARKET, specifying xe2x80x9cJanuaryxe2x80x9d for the MONTH dimension, xe2x80x9cStereosxe2x80x9d for the PRODUCT dimension, and xe2x80x9cEastern Regionxe2x80x9d for the MARKET dimension uniquely specifies a single value of the variable. Thus, dimensions offer a concise and intuitive way of organizing and selecting data for retrieval, updating, and performing calculations.
Multidimensional data may be stored in relational database systems (xe2x80x9cROLAPxe2x80x9d systems) or in specialized, xe2x80x9cmultidimensionalxe2x80x9d database systems (xe2x80x9cMOLAPxe2x80x9d systems). Multidimensional database systems provide structures and access techniques specifically designed for multidimensional data, and therefore provide relatively efficient storage and access to multidimensional data. However, when stored in specialized multidimensional database systems, only applications that are specially built to interact with those multidimensional database systems are able to access and manipulate the data.
On the other hand, when stored in relational database systems, all applications that support interaction with relational databases have access to the data. Such database applications communicate with the relational database system by submitting commands that conform to the database language supported by the relational database system, the most common of which is the Structured Query Language (SQL).
Relational database systems store data in the form of related tables, where each table has one or more columns and zero or more rows. The conventional mechanism for storing multidimensional data in a relational database system is to store the data in tables arranged in what is referred to as a star schema. In relational database systems, a star schema is distinguished by the presence of one or more relatively large tables and several relatively smaller tables. Rather than duplicating the information contained in the smaller tables, the large tables contain references (foreign key values) to rows stored in the smaller tables. The larger tables within a star schema are referred to as xe2x80x9cfact tablesxe2x80x9d, while the smaller tables are referred to as xe2x80x9cdimension tablesxe2x80x9d. FIG. 1 illustrates an exemplary star schema with two dimensions.
Referring to FIG. 1, it illustrates a database 100 that includes tables 102, 104 and 106. Table 102 is named xe2x80x9cstorexe2x80x9d and contains information about each of the stores in which sales may occur. Each row in store table 102 contains a unique store-id and information about the particular store that corresponds to the store-id. Table 104 is named xe2x80x9cproductxe2x80x9d and contains information about each type of product that may be sold in any of the stores. Each row in product table 104 contains a unique product-id and information about the particular product.
Table 106 is named xe2x80x9csalexe2x80x9d and contains information about each sale in each of the stores represented in the store table 102. Each row in sale table 106 includes a dollar amount, a store-id to indicate the store at which the sale was made, a product-id to indicate the product sold in the sale, and the date of the sale. Typically, the number of sales will be vastly greater than both the number of stores at which the sales are made and the number of products carried by the stores. Detailed information about the store and product involved in a sale transaction does not have to be stored in the rows of table 106 because such detailed information is available in tables 102 and 104, respectively. Instead, the rows of table 106 simply contain values (store-ids and product-ids) that reference information stored in the other tables 102 and 104. Therefore, tables 102, 104 and 106 constitute a star schema in which table 106 is the fact table and tables 102 and 104 are dimension tables.
The data stored in fact table 106 only has two dimensions, and therefore fact table 106 only has two columns dedicated to storing foreign key values for those dimensions. In general, a fact table must dedicate one column for storing foreign key values for each of the dimensions associated with the multidimensional data stored in the fact table. Thus, a fact table that stores data associated with twenty dimensions would have to dedicate twenty columns to the storage of foreign key values.
Storing multidimensional data within a relational database has two significant drawbacks. First, the fact table is significantly larger than it would have to be if it only had to store the multidimensional data itself. The massive size of the fact table, relative to the dimension data itself, is largely due to the need to store a foreign key value for each dimension for each multidimensional value. Second, the rows within a conventional fact table have no particular order. Consequently, multidimensional values that are closely related to each other conceptually may be stored relatively randomly throughout the entire fact table. This leads to inefficiencies because multidimensional data that is conceptually related to each other are frequently accessed and manipulated as a group.
An alternative approach to managing multidimensional data in a relational database involves storing the data in relational files but maintaining all multidimensional structure, metadata, administration, and access control using multidimensional database system techniques. Accessing relationally-stored data using multidimensional techniques poses numerous difficulties. For example, when all administration and access to the multidimensional data are controlled exclusively through the multidimensional database system engine, two database management systems must be administered. Further, database applications that access data using conventional relational commands (e.g. SQL commands) are unable to access the multidimensional data.
The approaches described above for storing multidimensional data in relational database systems demonstrate the tradeoffs made by prior approaches, which have either (1) sacrificed the benefits of multidimensional storage to enjoy the benefits of modern relational systems, such as conventional relational access, or (2) sacrificed the benefits of relational storage to attain the efficiency of multidimensional storage.
Techniques are provided which address the problems associated with the conventional approaches for storing multidimensional data in a relational database system. According to one embodiment, the many foreign key values of each row in the fact table are mapped to and replaced by a xe2x80x9creplacementxe2x80x9d value. A mapping function is provided that derives a replacement value from any given combination of foreign key values, and an inverse mapping function is provided to reproduce the combination of foreign key values given the replacement value. In one embodiment, a mapping function is selected such that the foreign key values that are close to each other. In one embodiment, the rows in the fact table are then stored within the fact table in the sorted order, thus causing values that are conceptually related to each other to be stored physically near other within the fact table.
Various techniques are provided for generating the replacement value from the foreign key values by subdividing the multidimensional cube that contains all the multidimensional values into smaller sub-cubes that are referred to as tiles. Variations on the tiling mechanism are provided, according to various embodiments. According to one embodiment, the cube is sub-divided into tiles based on the members of a particular level of a hierarchical dimension. According to another tiling approach, the tiles themselves may be subdivided into smaller tiles to create a hierarchy of tiles.