10 November 2018
A dimension in a database is a structure that differentiates between different sources of data so that similar information can be grouped together.These dimensional data can change slowly over time and is known as Slowly Chaning Dimensions(SCD).
A Slowly Changing Dimension is a dimension that stores and manages both current and historical data over time in a data warehouse. It is dimensions that change slowly over time, rather than changing on regular schedule, time-base. In data warehouse we need a dimensional attribute in order to report historical data. Some of the examples of these dimensions would be customer address, geograhy ,employee salary etc.It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
Most popular approaches to deal with SCD is given below .
Type 0 - The passive method In this method no special action is performed upon dimensional changes. Some dimension data can remain the same as it was first time inserted, others may be overwritten.
Type 1 - Overwriting the old value In a Type 1 SCD the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere else. This is the default type of dimension you create. You do not need to specify any additional information to create a Type 1 SCD.
In this method no history of dimension changes is kept in the database. The old dimension value is simply overwritten be the new one. This type is easy to maintain and is often use for data which changes are caused by processing corrections(e.g. removal special characters, correcting spelling errors).
Type 2 - Creating a new additional record In this methodology all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifier). Also ‘effective date’ and ‘current indicator’ columns are used in this method.
A Type 2 SCD retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active.
Type 3 - Adding a new column Adding a new column. In this type usually only the current and previous value of dimension is kept in the database. The new value is loaded into ‘current/new’ column and the old one into ‘old/previous’ column. Generally speaking the history is limited to the number of column created for storing historical data. A Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous value and the current value of the selected attribute. When the value of any of the selected attributes changes, the current value is stored as the old value and the new value becomes the current value.
Type 4 - Using historical table In this method a separate historical table is used to track all dimension’s attribute historical changes for each of the dimension.
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)