SSIS - Range lookups

Posted by Repieter on SQL Blogcasts See other posts from SQL Blogcasts or by Repieter
Published on Fri, 04 Feb 2011 12:41:00 GMT Indexed on 2011/02/04 15:29 UTC
Read the original article Hit count: 240

 

When developing an ETL solution in SSIS we sometimes need to do range lookups in SSIS. Several solutions for this can be found on the internet, but now we have built another solution which I would like to share, since it's pretty easy to implement and the performance is fast.

 

You can download the sample package to see how it works. Make sure you have the AdventureWorks2008R2 and AdventureWorksDW2008R2 databases installed. (Apologies for the layout of this blog, I don't do this too often :))

 

To give a little bit more information about the example, this is basically what is does: we load a facttable and do an SCD type 2 lookup operation of the Product dimension. This is done with a script component.

 

First we query the Data warehouse to create the lookup dataset. The query that is used for that is:

 

SELECT

    [ProductKey]

    ,[ProductAlternateKey]

    ,[StartDate]

    ,ISNULL([EndDate], '9999-01-01') AS EndDate

FROM [DimProduct]

 

 

The output of this query is stored in a DataTable:

 

 

string lookupQuery = @"

                        SELECT

                            [ProductKey]

                            ,[ProductAlternateKey]

                            ,[StartDate]

                            ,ISNULL([EndDate], '9999-01-01') AS EndDate

                        FROM [DimProduct]";

 

        OleDbCommand oleDbCommand = new OleDbCommand(lookupQuery, _oleDbConnection);

        OleDbDataAdapter adapter = new OleDbDataAdapter(oleDbCommand);

 

        _dataTable = new DataTable();

        adapter.Fill(_dataTable);

 

 

Now that the dimension data is stored in the DataTable we use the following method to do the actual lookup:

 

public int RangeLookup(string businessKey, DateTime lookupDate)

    {

        // set default return value (Unknown)

        int result = -1;

 

        DataRow[] filteredRows;

        filteredRows = _dataTable.Select(string.Format("ProductAlternateKey = '{0}'", businessKey));

 

        for (int i = 0; i < filteredRows.Length; i++)

        {

            // check if the lookupdate is found between the startdate and enddate of any of the records

            if (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])

            {

                result = (filteredRows[i][0] == null) ? -1 : (int)filteredRows[i][0];

                break;

            }

        }

 

        filteredRows = null;

 

        return result;

    }

 

 

 

This method is executed for every row that passes the script component. This is implemented in the ProcessInputRow method

 

public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

        // Perform the lookup operation on the current row and put the value in the Surrogate Key Attribute

        Row.ProductKey = RangeLookup(Row.ProductNumber, Row.OrderDate);

    }

 

Now what actually happens?!

 

  • 1. Every record passes the business key and the orderdate to the RangeLookup method.
  • 2. The DataTable is then filtered on the business key of the current record. The output is stored in a DataRow [] object.
  • 3. We loop over the DataRow[] object to see where the orderdate meets the following expression:

(lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])

  • 4. When the expression returns true (so where the data is between the Startdate and the EndDate), the surrogate key of the dimension record is returned

 

We have done some testing with this solution and it works great for us. Hope others can use this example to do their range lookups.

© SQL Blogcasts or respective owner

Related posts about SSIS 2008 Range Lookup