Business Intelligence Dictionary: Slowly Changing Dimensions-SCD explained

0
Business Intelligence Dictionary: Slowly Changing Dimensions-SCD explained 

Things change. Dimensions are no exclusion. When a data designer plans warehouse architecture they have to plan ahead with a slowly changing dimensions strategy.
Slowly Changing Dimensions-SCD | Definition
Slowly Changing Dimension (SCD) is a term first introduced by Kimball Group among strategies for implementation of a data warehouse. Ralph Kimball states that dimensions normally change overtime and offers several strategies how a data designer may handle this process.

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:

Type 0 - The passive method
Type 1 - Overwriting the old value
Type 2 - Creating a new additional record
Type 3 - Adding a new column
Type 4 - Using historical table
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

SCD Type 0 | Definition
SCD 0 is the simplest form of strategy. SCD 0 says: "Do not update the old row even if a change is detected. Preserve the initial state."
SCD Type 1 | Definition & Example
SCD 1 imposes a destructive update where old values are overwritten by the new state of a dimension.
 
SCD Type 2 | Definition & Example
SCD 2 is the most commonly used SCD type. It is flexible and optimized for most business scenarios. SCD 2 imposes "expiration" of a record that is handled by adding 2 new columns to the architecture - start date and end date, i.e. a period within which a dimension is active. End dates for active dimensions are hard-coded to future date like 9999-12-31.
SCD Type 3 | Definition & Example
SCD 3 is rarely used due to data redundancy. SCD 3 requires that a column is appended every time a dimension changes and this new column saves the previous state. SCD 3 also has a slightly optimized version that requires only 2 columns - Old State and New State but the obvious shortage is that this way it can only save 2 conditions.
SCD Type 4 | Definition & Example
SCD 4 states that one table must include all dimensions in their latest state and yet another History Table would be holding the previous states.
SCD Type 6 | Definition & Example
SCD 6 is a hybrid of SCD 1, SCD 2 and SCD 3 where it gets its name from (1+2+3). It imposes (SCD 1) a new row for each change, (SCD 2) Start Date/End Date columns and (SCD 3) a new flag column Y/N.
Note: You may also want to add a separate column, known as "Delete Indicator". It is used whenever a dimension does not have an active state anymore but you don't want to delete it in order to preserve the historical records for it.



Output Old Value On Update Option of Dynamic Lookup in Informatica

1

Here is some important properties of Dynamic Lookup Transformation





Dynamic Lookup Cache
Flat File
Indicates to use a dynamic lookup cache. Inserts or updates rows in the lookup cache as it passes rows to the target table.
Pipeline
Use with the lookup cache enabled.
Relational

Output Old Value On Update
Flat File
Use with dynamic caching enabled. When you enable this property, the Integration Service outputs old values out of the lookup/output ports. When the Integration Service updates a row in the cache, it outputs the value that existed in the lookup cache before it updated the row based on the input data. When the Integration Service inserts a row in the cache, it outputs null values.
Pipeline
When you disable this property, the Integration Service outputs the same values out of the lookup/output and input/output ports.
Relational
This property is enabled by default.



Note: 

In order to achieve the SCD type 2 with Dynamic Cache in informatica ,enable Output Old Value On Update in  Dynamic Lookup else it will inconsistent state .

Target Load Order/ Target Load Plan in Informatica

0

Target Load Order/ Target Load Plan in Informatica

Target Load Order:

Target load order (or) Target load plan is used to specify the order in which the integration service loads the targets. You can specify a target load order based on the source qualifier transformations in a mapping. If you have multiple source qualifier transformations connected to multiple targets, you can specify the order in which the integration service loads the data into the targets.

Target Load Order Group:

A target load order group is the collection of source qualifiers, transformations and targets linked in a mapping. The integration service reads the target load order group concurrently and it processes the target load order group sequentially. The following figure shows the two target load order groups in a single mapping:


Use of Target Load Order:

Target load order will be useful when the data of one target depends on the data of another target. For example, the employees table data depends on the departments data because of the primary-key and foreign-key relationship. So, the departments table should be loaded first and then the employees table. Target load order is useful when you want to maintain referential integrity when inserting, deleting or updating tables that have the primary key and foreign key constraints.

Target Load Order Setting:

You can set the target load order or plan in the mapping designer. Follow the below steps to configure the target load order:

1. Login to the powercenter designer and create a mapping that contains multiple target load order groups.
2. Click on the Mappings in the toolbar and then on Target Load Plan. The following dialog box will pop up listing all the source qualifier transformations in the mapping and the targets that receive data from each source qualifier.


3. Select a source qualifier from the list.
4. Click the Up and Down buttons to move the source qualifier within the load order.
5. Repeat steps 3 and 4 for other source qualifiers you want to reorder.
6. Click OK.

SQL SERVER – TRIM() Function

0

SQL SERVER – TRIM() Function – UDF TRIM()

SQL Server does not have function which can trim leading or trailing spaces of any string. TRIM() is very popular function in many languages. SQL does have LTRIM() and RTRIM() which can trim leading and trailing spaces respectively. I was expecting SQL Server 2005 to have TRIM() function. Unfortunately, SQL Server 2005 does not have that either. I have created very simple UDF which does the same work.

FOR SQL SERVER 2000:
CREATE FUNCTION dbo.TRIM(@string VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
RETURN 
LTRIM(RTRIM(@string))
END
GO

FOR SQL SERVER 2005:
CREATE FUNCTION dbo.TRIM(@string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
BEGIN
RETURN 
LTRIM(RTRIM(@string))
END
GO

Both the above UDF can be tested with following script
SELECT dbo.TRIM(' leading trailing ')
It will return string in result window as
'leading trailing'



Difference between Replace and Translate functions

0
Replace function searches for a string and replaces with the given string.
Translate function searches for a character and it replaces in occurrence of the character. 

REPLACE :

The Replace function replaces one value in a string with another. 

For example, you can replace each occurrence of a letter with matching number.

REPLACE (char,search_string,replace_string)

If value for replace_string is not specify, the search_string value, when found, is removed.

Possible input can be any character data types, like CHAR, VARCHAR2,NCHAR,CLOB.

SELECT REPLACE('COMPUTER','OM','AB')  FROM dual;

Output 
----------- 
CABPUTER

TRANSLATE :

Translate does an orderly character-by-character substitution in a string.

TRANSLATE (string,if,then)

    SELECT TRANSLATE(1256364,2345678,'BDEFGHI')  FROM dual; 

Output 
---------- 
1BFGDGE


 select TRANSLATE(1255,2455,'BDEFGHI') AA from DUAL;

 OUTPUT:
---------------

  1BEE

Mapping Parameters & Variables

0

Mapping Parameters & Variables


Use mapping parameters and variables to make mappings more flexible. Mapping parameters and variables represent values in mappings and mapplets. If you declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter and variable values of the mapping in the session. This can reduce the overhead of creating multiple mappings when only certain attributes of a mapping need to be changed.

 

To use a mapping parameter or variable in a mapping or mapplet, first we need to declare them in each mapping or mapplet. Then you can define a value for those mapping parameter or mapping variable before run the session.

 

 

Uses of Mapping Parameter:

·                     A mapping parameter represents a constant value that you can define before running a session. A mapping parameter retains the same value throughout the entire session.

·                     A mapping parameter cannot be change will session is using. It will retain the same values throughout the session.

·                     If mapping or mapplet is reusable then you change defines different values at parameter file.

·                     When you use a mapping parameter, you declare and use the parameter in a mapping or mapplet. Then define the value of the parameter in a parameter file. The Integration Service evaluates all references to the parameter to that value.

 

Mapping parameters and variables can be used in below transformations:

·                     Source qualifier

·                     Filter

·                     Expression

·                     User-Defined Join

·                     Router

·                     Update strategy

·                     Lookup override

 

Uses of Mapping Variable:

Unlike a mapping parameter, a mapping variable represents a value that can change through the session.

The Integration Service saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time you run the session.

A mapping variable can change dynamically 'N' no of the throughout the session.

Use a variable function in the mapping to change the value of the variable.

 

At the beginning of a session, the Integration Service evaluates references to a variable to determine the start value. At the end of a successful session, the Integration Service saves the final value of the variable to the repository. The next time you run the session, the Integration Service evaluates references to the variable to the saved value. To override the saved value, define the start value of the variable in a parameter file or assign a value in the pre-session variable assignment in the session properties.

 

Mapping parameters and variables can be used in below transformations:

·                     Filter

·                     Expression

·                     Router

·                     Update strategy

 

Initial and Default Values

When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. The Integration Service uses the configured initial value for a mapping parameter when the parameter is not defined in the parameter file. Similarly, the Integration Service uses the configured initial value for a mapping variable when the variable value is not defined in the parameter file, and there is no saved variable value in the repository.

 

When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the datatype of the parameter or variable.

 

The following table lists the default values the Integration Service uses for different types of data:

 

Data

Default Value

String

Empty string.

Numeric

0

Datetime

1/1/1753 A.D. or 1/1/1 when the Integration Service is configured for compatibility with 4.0.

 

Using String Parameters and Variables

 

For example, we might use a parameter named $$State in the filter for a Source Qualifier transformation to extract rows for a particular state:

            STATE = '$$State'

 

During the session, the Integration Service replaces the parameter with a string. If $$State is defined as MD in the parameter file, the Integration Service replaces the parameter as follows:

            STATE = 'MD'

 

You can perform a similar filter in the Filter transformation using the PowerCenter transformation language as follows:

            STATE = $$State

 

If you enclose the parameter in single quotes in the Filter transformation, the Integration Service reads it as the string literal "$$State" instead of replacing the parameter with "MD."

 

Variable Values

The Integration Service holds two different values for a mapping variable during a session run: 

1.             Start value of a mapping variable

2.             Current value of a mapping variable

 

The current value of a mapping variable changes as the session progresses. To use the current value of a mapping variable within the mapping or in another transformation, create the following expression with the SETVARIABLE function:

 

SETVARIABLE($$MAPVAR,NULL)

 

At the end of a successful session, the Integration Service saves the final current value of a mapping variable to the repository.

 

Start Value:

The start value is the value of the variable at the start of the session. The start value could be a value defined in the parameter file for the variable, a value assigned in the pre-session variable assignment, a value saved in the repository from the previous run of the session, a user defined initial value for the variable, or the default value based on the variable datatype. The Integration Service looks for the start value in the following order: 

1.             Value in parameter file

2.             Value in pre-session variable assignment

3.             Value saved in the repository

4.             Initial value

5.             Datatype default value

 

For example, you create a mapping variable in a mapping or mapplet and enter an initial value, but you do not define a value for the variable in a parameter file. The first time the Integration Service runs the session, it evaluates the start value of the variable to the configured initial value. The next time the session runs, the Integration Service evaluates the start value of the variable to the value saved in the repository. If you want to override the value saved in the repository before running a session, you need to define a value for the variable in a parameter file. When you define a mapping variable in the parameter file, the Integration Service uses this value instead of the value saved in the repository or the configured initial value for the variable. When you use a mapping variable ('$$MAPVAR') in an expression, the expression always returns the start value of the mapping variable. If the start value of MAPVAR is 0, then $$MAPVAR returns 0.

 Current Value
 
Variable Datatype and Aggregation Type

The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. As the session progresses, the Integration Service calculates the current value using a variable function that you set for the variable. Unlike the start value of a mapping variable, the current value can change as the Integration Service evaluates the current value of a variable as each row passes through the mapping. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository. The Integration Service states the value saved to the repository for each mapping variable in the session log.

When you declare a mapping variable in a mapping, you need to configure the datatype and aggregation type for the variable.

 

The datatype you choose for a mapping variable allows the Integration Service to pick an appropriate default value for the mapping variable. The default is used as the start value of a mapping variable when there is no value defined for a variable in the parameter file, in the repository, and there is no user defined initial value.

 

The Integration Service uses the aggregate type of a mapping variable to determine the final current value of the mapping variable. When you have a pipeline with multiple partitions, the Integration Service combines the variable value from each partition and saves the final current variable value into the repository.

 

You can create a variable with the following aggregation types:

·                     Count: Integer and small integer datatypes only.

·                     Max: All transformation datatypes except binary datatype.

·                     Min: All transformation datatypes except binary datatype.

 

You can configure a mapping variable for a Count aggregation type when it is an Integer or Small Integer. You can configure mapping variables of any datatype for Max or Min aggregation types.

 

To keep the variable value consistent throughout the session run, the Designer limits the variable functions you use with a variable based on aggregation type. For example, use the SetMaxVariable function for a variable with a Max aggregation type, but not with a variable with a Min aggregation type.

 

Variable Functions

Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. Use variable functions in an expression to set the value of a mapping variable for the next session run. The transformation language provides the following variable functions to use in a mapping:

·                     SetMaxVariable. Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. To use the SetMaxVariable with a mapping variable, the aggregation type of the mapping variable must be set to Max.

·                     SetMinVariable. Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. To use the SetMinVariable with a mapping variable, the aggregation type of the mapping variable must be set to Min.

·                     SetCountVariable. Increments the variable value by one. In other words, it adds one to the variable value when a row is marked for insertion, and subtracts one when the row is marked for deletion. It ignores rows marked for update or reject. To use the SetCountVariable with a mapping variable, the aggregation type of the mapping variable must be set to Count.

·                     SetVariable. Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository. To use the SetVariable function with a mapping variable, the aggregation type of the mapping variable must be set to Max or Min. The SetVariable function ignores rows marked for delete or reject.

 

Use variable functions only once for each mapping variable in a pipeline. The Integration Service processes variable functions as it encounters them in the mapping. The order in which the Integration Service encounters variable functions in the mapping may not be the same for every session run. This may cause inconsistent results when you use the same variable function multiple times in a mapping.

 

The Integration Service does not save the final current value of a mapping variable to the repository when any of the following conditions are true:

·                     The session fails to complete.

·                     The session is configured for a test load.

·                     The session is a debug session.

·                     The session runs in debug mode and is configured to discard session output.

 

Sample Mapping:

If you want to fetch only those records which are modified/create newly after the previous run. Then you need to create a user-defined mapping variable $$LastRunDateTime (datetime datatype) that saves the Timestamp of the last row that Integration Service read in the previous session.

 

And in the source qualifier define the filter condition:.

 

Syntax:

Table.DateTime_column > $$LastRunDateTime

 

Note: In case if you define user mapping variable as string then you need to convert it into date datatype.

Syntax:

Table.DateTime_column > to_date($$LastRunDateTime, 'YYYY-MM-DD HH:MM:SS'

1. In the Mapping Designer, click Mappings Or, in the Mapplet Designer.

2. Select 'Parameters and Variables'

3. Click the Add button:

 

Field

Description

Name

Parameter name.

The parameter name must be $$ followed by any alphanumeric or underscore characters.

Type

Variable/parameter. Select Parameter.

Datatype

Datatype of the parameter.

Precision or Scale

Precision and scale of the parameter.

Aggregation

Use for variables.

IsExprVar

Determines how the Integration Service expands the parameter in an expression string. If true, the Integration Service expands the parameter before parsing the expression. If false, the Integration Service expands the parameter after parsing the expression. Default is false.

Note: If you set this field to true, you must set the parameter datatype to String, or the Integration Service fails the session.

Initial Value

Initial value of the parameter.

 

If you do not set a value for the parameter in the parameter file, the Integration Service uses this value for the parameter during sessions.

 

If this value is undefined, then the Integration Service uses a default value based on the datatype of the mapping variable.

 

String=''

Integer=0

Description

Description associated with the parameter.

 

Click on 'OK'

 

4. Double click on Source Qualifier à Go to Properties tab.

From Source Filter click on Open Editorà Go to variables tab to added mapping variable to filter condition.

 

Select the Mapping Variable from Mapping variables folder and double click on it to add it to filter condition.

 

Click on 'OK.

Click on Apply and Ok.

 

Note: Similar for Mapping Parameters.

 

Just you need to define values in Parameter file for those Mapping Parameters and Mapping Variables you declare at mapping/mapplet.