If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material.  In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all.  
The problem is, predictably, that performance is not very good.  
The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts.  In this post, I’m going to look at 
the problem from a slightly different angle, 
and present an alternative solution to 
the one Brad found.  Inevitably, there’s going to be some overlap between our entries, 
and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here.  
The Example  We’ll use data from 
the AdventureWorks database, copied to temporary unindexed tables.  A script to create these structures is shown below:          CREATE  TABLE #Custs
            (
            CustomerID          INTEGER NOT NULL,
            TerritoryID         INTEGER NULL,
            CustomerType        NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
            );
    GO        
    CREATE  TABLE #Prods
            (
            ProductMainID       INTEGER NOT NULL,
            ProductSubID        INTEGER NOT NULL,
            ProductSubSubID     INTEGER NOT NULL,
            Name                NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
            );
    GO        
    CREATE  TABLE #OrdHeader
            (
            SalesOrderID        INTEGER NOT NULL,
            OrderDate           DATETIME NOT NULL,
            SalesOrderNumber    NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
            CustomerID          INTEGER NOT NULL,
            );
    GO
    CREATE  TABLE #OrdDetail
            (
            SalesOrderID        INTEGER NOT NULL,
            OrderQty            SMALLINT NOT NULL,
            LineTotal           NUMERIC(38,6) NOT NULL,
            ProductMainID       INTEGER NOT NULL,
            ProductSubID        INTEGER NOT NULL,
            ProductSubSubID     INTEGER NOT NULL,
            );
    GO
    INSERT  #Custs
            (
            CustomerID, 
            TerritoryID, 
            CustomerType
            )
    SELECT  C.CustomerID,
            C.TerritoryID,
            C.CustomerType
    FROM    AdventureWorks.Sales.Customer C 
            WITH (TABLOCK);
    GO
    INSERT  #Prods 
            (
            ProductMainID, 
            ProductSubID, 
            ProductSubSubID,
            Name
            )
    SELECT  P.ProductID,
            P.ProductID,
            P.ProductID,
            P.Name
    FROM    AdventureWorks.Production.Product P
            WITH (TABLOCK);
    GO 
    INSERT  #OrdHeader
            (
            SalesOrderID, 
            OrderDate, 
            SalesOrderNumber, 
            CustomerID
            )
    SELECT  H.SalesOrderID,
            H.OrderDate,
            H.SalesOrderNumber,
            H.CustomerID
    FROM    AdventureWorks.Sales.SalesOrderHeader H
            WITH (TABLOCK);
    GO
    INSERT  #OrdDetail
            (
            SalesOrderID, 
            OrderQty, 
            LineTotal, 
            ProductMainID, 
            ProductSubID, 
            ProductSubSubID
            )
    SELECT  D.SalesOrderID,
            D.OrderQty,
            D.LineTotal,
            D.ProductID,
            D.ProductID,
            D.ProductID
    FROM    AdventureWorks.Sales.SalesOrderDetail D
            WITH (TABLOCK);
The query itself is a simple join of 
the four tables:
  
    SELECT  P.ProductMainID AS PID,
            P.Name,
            D.OrderQty,
            H.SalesOrderNumber,
            H.OrderDate,
            C.TerritoryID
    FROM    #Prods P
    JOIN    #OrdDetail D 
            ON  P.ProductMainID = D.ProductMainID
            
AND P.ProductSubID = D.ProductSubID
            
AND P.ProductSubSubID = D.ProductSubSubID
    JOIN    #OrdHeader H 
            ON  D.SalesOrderID = H.SalesOrderID
    JOIN    #Custs C 
            ON  H.CustomerID = C.CustomerID 
    ORDER   BY
            P.ProductMainID ASC
    OPTION  (RECOMPILE, MAXDOP 1);
Remember that these tables have no indexes at all, 
and only 
the single-column sampled statistics 
SQL Server automatically creates (assuming default settings).  
The estimated query plan produced for 
the test query looks 
like this (click to enlarge):
The Problem
The problem here is one of cardinality estimation – 
the number of rows 
SQL Server expects to find at each step of 
the plan.  
The lack of indexes 
and useful statistical information means that 
SQL Server does not have 
the information it needs to make a good estimate.  Every join in 
the plan shown above estimates that it will produce just a single row as output.  Brad covers 
the factors that lead to 
the low estimates in his post.
In reality, 
the join between 
the #Prods 
and #OrdDetail tables will produce 121,317 rows.  It should not surprise you that this has rather dire consequences for 
the remainder of 
the query plan.  In particular, it makes a nonsense of 
the optimizer’s decision to use Nested Loops to join to 
the two remaining tables.  
Instead of scanning 
the #OrdHeader 
and #Custs tables once (as it expected), it has to perform 121,317 full scans of each.  
The query takes somewhere in 
the region of twenty minutes to run to completion on my development machine.
A Solution
At this point, you may be thinking 
the same thing I was: if we really are stuck with no indexes, 
the best we can do is to use hash joins everywhere.
We can force 
the exclusive use of hash joins in several ways, 
the two most common being join 
and query hints.  A join hint means writing 
the query using 
the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at 
the bottom of 
the query.  
The difference is that using join hints also forces 
the order of 
the join, whereas 
the query hint gives 
the optimizer freedom to reorder 
the joins at its discretion.
Adding 
the OPTION (HASH JOIN) hint results in this estimated plan:
That produces 
the correct output in around seven seconds, which is quite an improvement!  As a purely practical matter, 
and given 
the rigid rules of 
the environment we find ourselves in, we might leave things there.  (We can improve 
the hashing solution a bit – I’ll come back to that later on).
Faster Nested Loops
It might surprise you to hear that we can beat 
the performance of 
the hash join solution shown above using nested loops joins exclusively, 
and without breaking 
the rules we have been set.
The key to this part is to realize that a condition 
like (A = B) can be expressed as (A <= B) 
AND (A >= B).  Armed with this tremendous new insight, we can rewrite 
the join predicates 
like so:
  
    SELECT  P.ProductMainID AS PID,
            P.Name,
            D.OrderQty,
            H.SalesOrderNumber,
            H.OrderDate,
            C.TerritoryID
    FROM    #OrdDetail D
    JOIN    #OrdHeader H
            ON  D.SalesOrderID >= H.SalesOrderID
            
AND D.SalesOrderID <= H.SalesOrderID
    JOIN    #Custs C
            ON  H.CustomerID >= C.CustomerID
            
AND H.CustomerID <= C.CustomerID
    JOIN    #Prods P
            ON  P.ProductMainID >= D.ProductMainID
            
AND P.ProductMainID <= D.ProductMainID
            
AND P.ProductSubID = D.ProductSubID
            
AND P.ProductSubSubID = D.ProductSubSubID
    ORDER   BY 
            D.ProductMainID
    OPTION  (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER);
I’ve also added LOOP JOIN 
and FORCE ORDER query hints to ensure that only nested loops joins are used, 
and that 
the tables are joined in 
the order they appear.  
The new estimated execution plan is:
This new query runs in under 2 seconds.
Why Is It Faster?
The main reason for 
the improvement is 
the appearance of 
the eager Index Spools, which are also known as index-on-the-fly spools.  If you read my Inside 
The Optimiser series you might be interested to know that 
the rule responsible is called JoinToIndexOnTheFly.
An eager index spool consumes all rows from 
the table it sits above, 
and builds a index suitable for 
the join to seek on.  Taking 
the index spool above 
the #Custs table as an example, it reads all 
the CustomerID 
and TerritoryID values with a single scan of 
the table, 
and builds an index keyed on CustomerID.  
The term ‘eager’ means that 
the spool consumes all of its input rows when it starts up.  
The index is built in a work table in tempdb, has no associated statistics, 
and only exists until 
the query finishes executing.
The result is that each unindexed table is only scanned once, 
and just for 
the columns necessary to build 
the temporary index.  From that point on, every execution of 
the inner side of 
the join is answered by a seek on 
the temporary index – not 
the base table.
A second optimization is that 
the sort on ProductMainID (required by 
the ORDER BY clause) is performed early, on just 
the rows coming from 
the #OrdDetail table.  
The optimizer has a good estimate for 
the number of rows it needs to sort at that stage – it is just 
the cardinality of 
the table itself.  
The accuracy of 
the estimate there is important because it helps determine 
the memory grant given to 
the sort operation.  Nested loops join preserves 
the order of rows on its outer input, so sorting early is safe.  (Hash joins do not preserve order in this way, of course).
The extra lazy spool on 
the #Prods branch is a further optimization that avoids executing 
the seek on 
the temporary index if 
the value being joined (the ‘outer reference’) hasn’t changed from 
the last row received on 
the outer input.  It takes advantage of 
the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at 
the join operator one after 
the other.
The optimizer is quite conservative about introducing index spools into a plan, because creating 
and dropping a temporary index is a relatively expensive operation.  It’s presence in a plan is often an indication that a useful index is missing.
I want to stress that I rewrote 
the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system.
Improving 
the Hash Join
I promised I would return to 
the solution that uses hash joins.  You might be puzzled that 
SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins.  
The answer, again, is down to 
the poor information available to 
the optimizer.  Let’s look at 
the hash join plan again:
Two of 
the hash joins have single-row estimates on their build inputs.  
SQL Server fixes 
the amount of memory available for 
the hash table based on this cardinality estimate, so at run time 
the hash join very quickly runs out of memory.
This results in 
the join spilling hash buckets to disk, 
and any rows from 
the probe input that hash to 
the spilled buckets also get written to disk.  
The join process then continues, 
and may again run out of memory.  This is a recursive process, which may eventually result in 
SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow.  
The data sizes in 
the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion.  You can see this for yourself by tracing 
the Hash Warning event using 
the Profiler tool.
The final sort in 
the plan also suffers from a similar problem: it receives very little memory 
and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this).  Notice also that because hash joins don’t preserve sort order, 
the sort cannot be pushed down 
the plan toward 
the #OrdDetail table, as in 
the nested loops plan.
Ok, so now we understand 
the problems, what can we do to fix it?  We can address 
the hash spilling by forcing a different order for 
the joins:
  
    SELECT  P.ProductMainID AS PID,
            P.Name,
            D.OrderQty,
            H.SalesOrderNumber,
            H.OrderDate,
            C.TerritoryID
    FROM    #Prods P
    JOIN    #Custs C
    JOIN    #OrdHeader H
            ON  H.CustomerID = C.CustomerID
    JOIN    #OrdDetail D
            ON  D.SalesOrderID = H.SalesOrderID
            ON  P.ProductMainID = D.ProductMainID
            
AND P.ProductSubID = D.ProductSubID
            
AND P.ProductSubSubID = D.ProductSubSubID
    ORDER   BY 
            D.ProductMainID
    OPTION  (MAXDOP 1, HASH JOIN, FORCE ORDER);
With this plan, each of 
the inputs to 
the hash joins has a good estimate, 
and no hash recursion occurs.  
The final sort still suffers from 
the one-row estimate problem, 
and we get a single-pass sort warning as it writes rows to disk.  Even so, 
the query runs to completion in three or four seconds.  That’s around half 
the time of 
the previous hashing solution, but still not as fast as 
the nested loops trickery.
Final Thoughts
SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information.  We can’t really blame 
the performance problems highlighted here on anything other than 
the decision to use completely unindexed tables, 
and not to allow 
the creation of additional statistics.
I should probably stress that 
the nested loops solution shown above is not one I would normally contemplate in 
the real world.  It’s there primarily for its educational 
and entertainment value.  I might perhaps use it to demonstrate to 
the sceptical that 
SQL Server itself is crying out for an index.
Be sure to read Brad’s original post for more details.  My grateful thanks to him for granting permission to reuse some of his material.
Paul White 
  Email: 
[email protected] 
  Twitter: @PaulWhiteNZ