Friday 13 November 2015

TRAINING (DW-3) - DATA WAREHOUSE CONCEPTS - Part3 (Dimension Modeling, Fact & Dimension Tables)


DIMENSIONAL MODELING
A Dimensional Model is a database structure that is optimized for online queries and Data Warehousing tools. It is comprised of "FACT" and "DIMENSION" tables. 


DIFFERENT TERMS USED IN DIMENSIONAL MODELING

1.) FACT or MEASURE

A "fact" is a numeric value (analytic number)  that a business wishes to count or sum. For example Total Salary, Quarterly Sales etc.
> An additive numerical value that represents a business metric.
> You can have multiple measures within a fact table.  
> For example, if your fact table is used to track purchases internationally you might have measures for each type of currency.  
If you are building a fact table for the retail industry you might also have measures like cost, list price, average sale price

2.) DIMENSION
A "dimension" is essentially an entry point for getting at the facts. Dimension generally refers to different aspects of a business for example employee, sales, customers etc.

3.) GROUPS (Tables)
Dimension are normally broken down into GROUPS (tables).

4.) ATTRIBUTES (columns)
Dimension are broken down into GROUPS (tables) and they contain several ATTRIBUTES (columns).

5.) DIMENSION TABLE
Provides the description of the FACTS/MEASURES (analytic numbers).
> Used to filter queries and to select data on the basis for example who, what, when, where and why behind the facts.
> H
ave many columns but a limited amount of rows
> Example of Dimension tables are products, employees, customers, time, and location etc.

DESIGNING A DIMENSION TABLE
  • Use a unique identifier integer column that is auto incremental as your PRIMARY KEY.  This is commonly known as a SURROGATE KEY.
  • Use the source’s (OLTP) primary key as an ALTERNATE/BUSINESS KEY.
  • Any additional ATTRIBUTES (columns) that describe the business entity


6.) DIMENSION HIERARCHIES
> Commonly defined as a structure to provide drill up and drill down capabilities.
> For example, in a time dimension you might have two separate hierarchies, one for fiscal year and another for calendar year.  These two dimension could contain the following attributes year, quarter, month, day.

7.) SLOWLY CHANGING DIMENSIONS
In order to handle the data change for data warehouse dimension's attribute, below could be options:

A.) Do nothing (type 0).  
> Not recommended.
> Can not guarantee history preservation 
> Least control over managing changed attributes.

B.) Overwrite old data with new data (type 1)
> Completely overwrites the previous attribute value.  
> Could be ideal for spelling error or the historical value of this column is not crucial. 
> Easiest method to implement but it can be hard to manage.

C.) Create multiple records with different key values (type 2)
> Provides you with the ability to have an unlimited amount of history.  
> Usually implemented by adding a start and end date column.
> If you have to make changes to past records this could require you to also update fact records.

D.) Create a new column for previous values (type 3)
This allows only a fixed amount of history to be retained.  If you create two extra columns for the team attribute you can only store three team values (the current value and last two previous values)

8.) FACT TABLE
> A table that joins dimension tables with measures.  
> Fact tables are huge as compared to dimension tables. Therefore, these tables must be normalized to be efficient.
> A fact table is normalized when it consists of the following two things:
- A composite primary key using the foreign keys to the dimension tables.
- Measure(s) used for numerical analysis.

DESIGNING A FACT TABLE
  • Create a COMPOSITE PRIMARY KEY using foreign keys of Dimension tables.
  • Add an additional column for the measure.
  • For example, To know "time worked" by employees, by location, by project and by task.  If you had a dimension for employees, location, project and task you would create a COMPOSITE PRIMARY KEY using these foreign keys and add an additional column for the "time worked" measure.

No comments:

Post a Comment