Monday, September 13, 2010

Normalizing and loading Excel files using Informatica Powercenter

Sometimes it is a requirement where information is managed in so-called flat files (i.e.: .csv, .xls, etc) which are not user-friendly. The purpose of this article is to explain the process of loading such files in the data warehouse using informatica power center.

Scenario

A sample source file is shown below:

Clipboard01

And the expected result to be loaded into the DW is as below-

2

Solution

One possible way to achieve this is explained below -

In Excel

1 – Define a range in Excel selecting the following area (as can be seen below) and name this “Budget”

Clipboard03

In Windows

2 – Create an ODBC connection to the excel file where you defined the range in the previous point.

In Informatica Powercenter

3 – In Source Analyzer, import a table from the database choosing the ODBC connection defined in the previous point selecting the “Budget” table or the name given to the table in the point 1.

4 – Create a target table with the relevant target fields. Then in Target Designer, import the table where you want to load the data normalized. You can see a simplified version of the table below.

4

5 – In Mapping Designer, create a mapping which should look like the following.

Clipboard05

Tips:

– We’ll use just the first 13th port to link “Source Qualifier” and “Normalize” transformations. (From port called “Sales_Rep” to port called “Dec”, the columns that we want to load into the target table)


6 – Normalizing – Configuring the Transformation

As you can see in the snapshot, we have to create two ports inside the tab called “Normalizer”, one for the Sales Representative and the other for the Budgets.

6

Sales_Rep: is the first column that we have in the Excel file.

Budget: in the option called “occurs”, we have to divide it into 12 columns because it is the number of Budgets per year (monthly budget).

7

In the tab called “Ports” we can see that several ports were generated automatically.

8

7 – Configuring the “Expression”

We have to configure the expression as seen in the snapshot.

9

Tips:

Family_Group: a parameter has to be defined into the mapping, called $$Family and define it as Parameter type, because this mapping will be executed twice, but with different Families (Family1 & Family2)

GCID_Period_OUT: a new column with the following formula has to be created:

DECODE(GCID_Period_IN, 1,'Jan', 2,'Feb', 3,'Mar', 4,'Apr', 5,'May', 6,'Jun', 7,'Jul', 8,'Aug', 9,'Sep', 10,'Oct', 11,'Nov','Dec')

The values that we have in GCID_Period_IN, is sequential beginning from 1 up to the total number of occurrences (in this case 12) defined in point 6 which are basically the number of months.

In Workflow Manager

8 – Configuring the workflow

As can be seen in the snapshot below, the same mapping will be executed twice, taking into account the following configurations.

10

Session Configurations:

The first session:

The Parameter Filename has to be set (paramfile_Family1.txt, which should be defined inside by the following parameter: $$Family = ‘Family1’

The second session:

Another Parameter Filename has to be set (paramfile_Family1.txt, which should be defined by the following parameters, $$Family = ‘Family2’ and VERY IMPORTANT, we have to override the SQL query into the session using the following:

text

11

Basically, you are changing the source columns to load into the second session, the Family2’s budget, as can be seen in the following diagram.

Clipboard012