Handling primary key duplicates in a data warehouse load

Posted by Meff on Stack Overflow See other posts from Stack Overflow or by Meff
Published on 2010-04-30T15:55:16Z Indexed on 2010/04/30 15:57 UTC
Read the original article Hit count: 216

Filed under:

etl

|

ssis

|

data-warehouse

I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID.

I've encountered a problem with transactions being reversed - In the transactional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation.

I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal.

I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column.

My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.

© Stack Overflow or respective owner

Related posts about etl

SQL SERVER – Introduction to Adaptive ETL Tool – How adaptive is your ETL?

as seen on SQL Authority - Search for 'SQL Authority'
I am often reminded by the fact that BI/data warehousing infrastructure is very brittle and not very adaptive to change. There are lots of basic use cases where data needs to be frequently loaded into SQL Server or another database. What I have found is that as long as the sources and targets stay… >>> More
SQL SERVER – 4 Tips for ETL Software IDE Developers

as seen on SQL Authority - Search for 'SQL Authority'
In a previous blog, I introduced the notion of Semantic Types. To an end-user, a seamlessly integrated semantic typing engine significantly increases the ease of use of an ETL IDE (integrated development environment, or developer studio). This led me to think about other ease-of-use issues I have… >>> More
PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

as seen on Stack Overflow - Search for 'Stack Overflow'
Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we… >>> More
ETL Operation - Return Primary Key

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using Talend to populate a data warehouse. My job is writing customer data to a dimension table and transaction data to the fact table. The surrogate key (p_key) on the fact table is auto-incrementing. When I insert a new customer, I need my fact table to reflect the id of the related customer… >>> More
Kimball University: Three ETL Compromises to Avoid

as seen on SQL Server Central - Search for 'SQL Server Central'
Why neglecting slowly changing dimensions, failing to capture metadata and overlooking scope creep can be the undoing of a dimensional data warehousing initiative. >>> More

Related posts about ssis

SSIS packages incompatibilities between SSIS 2008 and SSIS 2008 R2

as seen on SQL Blog - Search for 'SQL Blog'
When you install SQL 2008 R2 workstation components you get a newer version of BIDS (BI Developer Studio, included in the workstation components) that replaces BIDS 2008 version (BIDS 2005 still live side-by-side). Everything would be good if you can use the newer version to edit any 2008 AND 2008R2… >>> More
Parsing flat files using SSIS : SSIS Nugget

as seen on SQL Blog - Search for 'SQL Blog'
Often when using SQL Server Integration Services (SSIS) you will find there is more than one way of accomplishing a task and that the most obvious method of doing so might not be the optimal one. In the video below I demonstrate this by way of an experiment using SSIS’s Flat File Source component;… >>> More
Introducing SSIS Reporting Pack for SQL Server code-named Denali

as seen on SQL Blog - Search for 'SQL Blog'
In recent blog posts I have introduced the new SSIS Catalog that is forthcoming in SQL Server Code-named Denali: What's new in SSIS in Denali Introduction to SSIS Projects in Denali Parameters in SSIS In Denali SSIS Server, Catalogs, Environments and Environment Variables in SSIS in Denali… >>> More
New SSIS tool on Codeplex – SSIS Log Analyzer

as seen on SQLIS - Search for 'SQLIS'
I stumbled across a new SSIS tool on Codeplex today, the SSIS Log Analyzer which was only released a few days ago. Whilst it is a beta release and currently only supports 2005 (2008 is promised) it looks quite interesting. It seems to be a fancy log viewer, but with some clever features and a nice… >>> More
New SSIS tool on Codeplex – SSIS Log Analyzer

as seen on SQLIS - Search for 'SQLIS'
I stumbled across a new SSIS tool on Codeplex today, the SSIS Log Analyzer which was only released a few days ago. Whilst it is a beta release and currently only supports 2005 (2008 is promised) it looks quite interesting. It seems to be a fancy log viewer, but with some clever features and a nice… >>> More