In data management and data warehousing, a dimension refers to logical groupings of data such as personnel data, geographical location, customer information, or product information. Slowly Changing Dimensions (SCD) are dimensions that have data that change over time. Conventionally, there are three methods to handle slowly changing dimensions, designated as Type 1, Type 2 and Type 3. Which technique to use depends on business requirements, as each technique behaves differently.
Using Type 1 SDC's, the old data is overwritten with new data. The dimension attribute reflects the latest state, but any historical values are lost. This technique is often utilized for processing corrections, or when the old value has no business significance. While the type 1 technique appears simple to implement, unexpected complications may arise with type 1 attribute changes. For example, if fact data have been previously aggregated based on the type 1 attribute, when the dimension value is overwritten, then any ad hoc summarization based on the new value will no longer tie to the pre aggregated data.
With a Type 2 change, a new row with a new surrogate primary key is inserted into the dimension table to reflect the new attribute values. Both the prior and new rows include the natural key (or durable identifier) as an attribute, along with a row effective date, row expiration date, and current row indicator. The type 2 technique is used when a meaningful change to the dimension has taken place and it is appropriate to partition history by the changed attribute. Each surrogate key corresponds to a unique version of the dimension row that was true for a span of time. Thus, each surrogate key is used in the corresponding fact rows during the time when the particular instance was valid.
The Type 3 technique is usually utilized when the change, like the redrawing of sales district boundaries or redefinition of product category boundaries, is a “soft” change. In other words, although the change has occurred, it is still logically possible to act as if the change had not occurred. For example, a business entity may want to track sales performance with either the old or the new sales district definitions. In this case, history cannot be partitioned as in a Type 2 methodology, but rather, the Type 3 methodology simultaneously provides both the old attribute value and new value in the same dimension row. This allows users to choose between the two versions. With a type 3 response, a new column is added to the dimension table. The old attribute value is pushed into a “prior” attribute column and the new attribute value is overwritten in the existing column.
However, as noted above, the behaviors of each Type are different. As such, the methodology for processing and consuming information contained in warehouses where both types are used together can be complex, especially when the dimensional source data itself is equally complex.