Merge Join component sorted outputs [SSIS]

Posted by jamiet on SQL Blog See other posts from SQL Blog or by jamiet
Published on Mon, 31 Jan 2011 19:40:00 GMT Indexed on 2011/01/31 23:31 UTC
Read the original article Hit count: 238

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this:

Why isn’t the Merge Join output sorted (i.e.IsSorted=True)?

This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it.

Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component:

image

Let’s take a look inside the editor of the Merge Join:

image

We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look:

image

Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning:

Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed.

The warning proves that the output from the Merge Join is sorted!

It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output.

So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there!

@Jamiet 

P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

© SQL Blog or respective owner

Related posts about dataflow

Related posts about IsSorted

  • Merge Join component sorted outputs [SSIS]

    as seen on SQL Blog - Search for 'SQL Blog'
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well… >>> More

  • Merge Join component sorted outputs [SSIS]

    as seen on SQL Blog - Search for 'SQL Blog'
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well… >>> More

  • The blocking nature of aggregates

    as seen on SQL Blog - Search for 'SQL Blog'
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data… >>> More

  • The blocking nature of aggregates

    as seen on SQL Blog - Search for 'SQL Blog'
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data… >>> More

  • 3-way quicksort, question

    as seen on Stack Overflow - Search for 'Stack Overflow'
    I am trying to understand the 3-way radix Quicksort, and i dont understand why the the CUTOFF variable there? and the insertion method? public class Quick3string { private static final int CUTOFF = 15; // cutoff to insertion sort // sort the array a[] of strings public static void… >>> More