A “data warehouse” is a well-known technique for storing large volumes of complex data in a manner that facilitates periodic as well as on-demand reporting. A data warehouse has been defined as a “collection of data, from a variety of sources, organized to provide useful guidance to an organization's decision makers”. It is also well-known to design data warehouses in accordance with a number of conventional schemas, including a “star” schema, as illustrated in FIG. 1. In the schema shown in FIG. 1, a fact table 102 references several dimension tables 104, 106, 108, 110. According to a simple illustrative example, the fact table 102 may store data relating to sales transactions in a retail store chain. The first dimension table 104 may correspond to a “date” dimension that stores date-related attributes (e.g., year, month, day_of_month, day_of_week, etc.) for the sales transactions. The second dimension table 106 may correspond to a “store” dimension including attributes that identify individual stores in the chain. (Such attributes may include store_number, city, state, country, region, etc.) The third dimension table 108 may correspond to an “employee” dimension, having attributes such as employee_name, employee_id, supervisor, department, etc. The fourth dimension table 110 may correspond to a “product” dimension, with attributes such as product_name, product_id, brand, product_category, etc.
The fact table 102 may store basic facts for each transaction, such as number of units sold and price per unit. Each entry in the fact table also contains the primary keys of entries in the dimension tables to tie the entry in the fact table to the corresponding dimension attribute values in the entries in the dimension tables.
The star schema illustrated in FIG. 1, and similar schemas, are examples of dimensional data modeling. One issue to be faced in dimensional data modeling is how to deal with so-called “slowly changing dimensions”. In the example described above relative to FIG. 1, the value of the “employee_name” attribute for a given employee may change on occasion if, for example, an employee's marital status changes. There are a number of conventional approaches for handling slowly changing dimensions (SCDs). According to “Type I” changes, the new data is simply written over the previous data entry in the dimension table. This has the disadvantage of wiping out all history.
According to “Type II” changes, a new row with a new primary key is inserted in the dimension table to reflect the change, with starting and ending effective dates included in the new row and the prior row.
“Type III” changes have conventionally been used to facilitate analysis based on either the new or prior attribute value. In this technique, an additional attribute (i.e., a new column) is added to the dimension table. In the example, described above, the new column in the employee dimension table 108 may be headed “original_employee_name”, and the “employee_name” attribute/column may be re-designated as “current_employee_name”. One disadvantage of Type III handling of SCDs is that the addition of new columns to the dimension table may require costly redesign of the schema itself. Further, providing additional columns to accommodate dimension value changes may significantly increase the size and cost of the data warehouse, particularly in actual typical implementations that include dozens or even hundreds of different attributes among a number of different dimension tables.