More hardware normally delivers better performance, but there are exceptions where it can hinder performance. Understanding these exceptions and working around it is a major part of SQL Server performance tuning.
 
When a memory allocating query executes in parallel, SQL Server distributes memory to each task that is executing part of the query in parallel. In our example the sort operator that executes in parallel divides the memory across all tasks assuming even distribution of rows. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.
 
In reality, how often are column values evenly distributed, think about an example; are employees working for your company distributed evenly across all the Zip codes or mainly concentrated in the headquarters? What happens when you sort result set based on Zip codes? Do all products in the catalog sell equally or are few products hot selling items?
 
One of my customers tested the below example on a 24 core server with various MAXDOP settings and here are the results:MAXDOP 1: 
CPU time = 1185 ms, elapsed time = 1188 msMAXDOP 4: 
CPU time = 1981 ms, elapsed time = 1568 msMAXDOP 8: 
CPU time = 1918 ms, elapsed time = 1619 msMAXDOP 12: 
CPU time = 2367 ms, elapsed time = 2258 msMAXDOP 16: 
CPU time = 2540 ms, elapsed time = 2579 msMAXDOP 20: 
CPU time = 2470 ms, elapsed time = 2534 msMAXDOP 0: 
CPU time = 2809 ms, elapsed time = 2721 ms - all 24 cores.In the above test, when the data was evenly distributed, the elapsed time of parallel query was always lower than serial query.
 
Why does the query get slower and slower with more 
CPU cores / higher MAXDOP? Maybe you can answer this question after reading the article; let me know: 
[email protected].
 
Well you get the point, let’s see an example.
 
The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.
 
Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001.
update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1
update Employees set Zip = 2001 where EmployeeID % 50 != 1
go
update statistics Employees with fullscan
go
 
Let’s create the temporary table #FireDrill with all possible Zip codes.
drop table #FireDrill
go
create table #FireDrill (Zip int primary key)
insert into #FireDrill select distinct Zip from Employees
update statistics #FireDrill with fullscan
go  
Let’s execute the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--First serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 1)
goThe query took 1011 ms to complete. 
 
The execution plan shows the 77816 KB of memory was granted while the estimated rows were 799624.  
No Sort Warnings in SQL Server Profiler.  
Now let’s execute the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 0)
go
The query took 1912 ms to complete.  
The execution plan shows the 79360 KB of memory was granted while the estimated rows were 799624.  
The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead.
Sort properties shows the rows are unevenly distributed over the 4 threads. 
 
Sort Warnings in SQL Server Profiler.  
 Intermediate Summary: The reason for the higher duration with parallel plan was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001. 
Now let’s update the Employees table and distribute employees evenly across all Zip codes.
 
update Employees set Zip = EmployeeID / 400 + 1
go
update statistics Employees with fullscan
go
 Let’s execute the query serially with MAXDOP 1. 
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 1)
go
 
The query took 751 ms to complete.  
The execution plan shows the 77816 KB of memory was granted while the estimated rows were 784707.  
No Sort Warnings in SQL Server Profiler.   
Now let’s execute the query in parallel with MAXDOP 0. 
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 0)
go
The query took 661 ms to complete.  
The execution plan shows the 79360 KB of memory was granted while the estimated rows were 784707.  
Sort properties shows the rows are evenly distributed over the 4 threads.
No Sort Warnings in SQL Server Profiler.  
 
Intermediate Summary: When employees were distributed unevenly, concentrated on 1 Zip code, parallel sort spilled while serial sort performed well without spilling to tempdb. When the employees were distributed evenly across all Zip codes, parallel sort and serial sort did not spill to tempdb. This shows uneven data distribution may affect the performance of some parallel queries negatively. For detailed discussion of memory allocation, refer to webcasts available at www.sqlworkshops.com/webcasts.
 
 
Some of you might conclude from the above execution times that parallel query is not faster even when there is no spill. Below you can see when we are joining limited amount of Zip codes, parallel query will be fasted since it can use Bitmap Filtering.
 
Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001.
update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1
update Employees set Zip = 2001 where EmployeeID % 50 != 1
go
update statistics Employees with fullscan
go  
Let’s create the temporary table #FireDrill with limited Zip codes.
drop table #FireDrill
go
create table #FireDrill (Zip int primary key)
insert into #FireDrill select distinct Zip
      from Employees where Zip between 1800 and 2001
update statistics #FireDrill with fullscan
go  
Let’s execute the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 1)
go
The query took 989 ms to complete.  
The execution plan shows the 77816 KB of memory was granted while the estimated rows were 785594. 
No Sort Warnings in SQL Server Profiler.  
Now let’s execute the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 0)
go
The query took 1799 ms to complete.  
The execution plan shows the 79360 KB of memory was granted while the estimated rows were 785594.  
Sort Warnings in SQL Server Profiler.  
 
The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead.
 Intermediate Summary: The reason for the higher duration with parallel plan even with limited amount of Zip codes was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001. 
 
Now let’s update the Employees table and distribute employees evenly across all Zip codes.
update Employees set Zip = EmployeeID / 400 + 1
go
update statistics Employees with fullscan
go
Let’s execute the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 1)
go
The query took 250  ms to complete.  
The execution plan shows the 9016 KB of memory was granted while the estimated rows were 79973.8.  
No Sort Warnings in SQL Server Profiler.  
Now let’s execute the query in parallel with MAXDOP 0.  
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
      inner join #FireDrill fd on (e.Zip = fd.Zip)
      order by e.Zip
option (maxdop 0)
go
The query took 85 ms to complete.  
The execution plan shows the 13152 KB of memory was granted while the estimated rows were 784707.  
No Sort Warnings in SQL Server Profiler.  
 
Here you see, parallel query is much faster than serial query since SQL Server is using Bitmap Filtering to eliminate rows before the hash join.
 
Parallel queries are very good for performance, but in some cases it can hinder performance. If one identifies the reason for these hindrances, then it is possible to get the best out of parallelism. I covered many aspects of monitoring and tuning parallel queries in webcasts (www.sqlworkshops.com/webcasts) and articles (www.sqlworkshops.com/articles). I suggest you to watch the webcasts and read the articles to better understand how to identify and tune parallel query performance issues.
 
Summary: One has to avoid sort spill over tempdb and the chances of spills are higher when a query executes in parallel with uneven data distribution. Parallel query brings its own advantage, reduced elapsed time and reduced work with Bitmap Filtering. So it is important to understand how to avoid spills over tempdb and when to execute a query in parallel.
 
I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.
 
Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.
 
Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners.
Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.
This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.
 
Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.
 
R Meyyappan 
[email protected]
LinkedIn: http://at.linkedin.com/in/rmeyyappan