Search Results

Search found 150 results on 6 pages for 'tempdb'.

Page 6/6 | < Previous Page | 2 3 4 5 6

T SQL Rotate row into columns

- by cshah

SQL 2005 using T-SQL, I want to rotate rows into columns. Sample script: Use TempDB Go CREATE TABLE [dbo].[CPPrinter_InkLevels]( [CPPrinter_InkLevels_ID] [int] IDENTITY(1,1) NOT NULL, [CPMeasurementGUID] [uniqueidentifier] NOT NULL, [InkName] [varchar](30) NOT NULL, [InkLevel] [decimal](6, 2) NOT NULL, CONSTRAINT [PK_CPPrinter_InkLevels] PRIMARY KEY CLUSTERED ( [CPPrinter_InkLevels_ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET IDENTITY_INSERT [dbo].[CPPrinter_InkLevels] ON INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (1, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Black', CAST(0.60 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (2, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Cyan', CAST(0.69 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (3, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Magenta', CAST(0.55 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (4, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Yellow', CAST(0.51 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (5, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Light Black', CAST(0.64 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (6, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Light Cyan', CAST(0.43 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (7, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Light Magenta', CAST(0.30 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (8, N'6acc1562-4e02-45ff-b480-9e01fb97fccf', N'Waste Tank', CAST(0.18 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (9, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Black', CAST(0.60 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (10, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Cyan', CAST(0.69 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (11, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Magenta', CAST(0.55 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (12, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Yellow', CAST(0.51 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (13, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Light Black', CAST(0.64 AS Decimal(6, 2))) INSERT [dbo].[CPPrinter_InkLevels] ([CPPrinter_InkLevels_ID], [CPMeasurementGUID], [InkName], [InkLevel]) VALUES (14, N'932348a7-6e2f-4a10-9760-be1ae640c7d7', N'Light Cyan', CAST(0.43 AS Decimal(6, 2))) Go SELECT * FROM [dbo].[CPPrinter_InkLevels] --Desired output CPMeasuremnetGUID, Ink1, Level1, Ink2, Level2, Ink3, Level3....

Read the article
I want just the insert query for a temp table.

- by John Stephen

Hi..I am using C#.Net and Sql Server ( Windows Application ). I had created a temporary table. When a button is clicked, temporary table (#tmp_emp_details) is created. I am having another button called "insert Values" and also 5 textboxes. The values that are entered in the textbox are used and whenever com.ExecuteNonQuery(); line comes, it throws an error message Invalid object name '#tbl_emp_answer'.. Below is the set of code.. Please give me a solution. Code for insert (in insert value button): private void btninsertvalues_Click(object sender, EventArgs e) { username = txtusername.Text; examloginid = txtexamloginid.Text; question = txtquestion.Text; answer = txtanswer.Text; useranswer = txtanswer.Text; SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=tempdb;Integrated Security=True;"); SqlCommand com = new SqlCommand("Insert into #tbl_emp_answer values('"+username+"','"+examloginid+"','"+question+"','"+answer+"','"+useranswer+"')", con); con.Open(); com.ExecuteNonQuery(); con.Close(); }

Read the article
I want a insert query for a temp table

- by John Stephen

Hi..I am using C#.Net and Sql Server ( Windows Application ). I had created a temporary table. When a button is clicked, temporary table (#tmp_emp_answer) is created. I am having another button called "insert Values" and also 5 textboxes. The values that are entered in the textbox are used and whenever com.ExecuteNonQuery(); line comes, it throws an error message Invalid object name '#tbl_emp_answer'.. Below is the set of code.. Please give me a solution. Code for insert (in insert value button): private void btninsertvalues_Click(object sender, EventArgs e) { username = txtusername.Text; examloginid = txtexamloginid.Text; question = txtquestion.Text; answer = txtanswer.Text; useranswer = txtanswer.Text; SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=tempdb;Integrated Security=True;"); SqlCommand com = new SqlCommand("Insert into #tbl_emp_answer values('"+username+"','"+examloginid+"','"+question+"','"+answer+"','"+useranswer+"')", con); con.Open(); com.ExecuteNonQuery(); con.Close(); }

Read the article
Is there a reason why SSIS significantly slows down after a few minutes?

- by Mark

I'm running a fairly substantial SSIS package against SQL 2008 - and I'm getting the same results both in my dev environment (Win7-x64 + SQL-x64-Developer) and the production environment (Server 2008 x64 + SQL Std x64). The symptom is that initial data loading screams at between 50K - 500K records per second, but after a few minutes the speed drops off dramatically and eventually crawls embarrasingly slowly. The database is in Simple recovery model, the target tables are empty, and all of the prerequisites for minimally logged bulk inserts are being met. The data flow is a simple load from a RAW input file to a schema-matched table (i.e. no complex transforms of data, no sorting, no lookups, no SCDs, etc.) The problem has the following qualities and resiliences: Problem persists no matter what the target table is. RAM usage is lowish (45%) - there's plenty of spare RAM available for SSIS buffers or SQL Server to use. Perfmon shows buffers are not spooling, disk response times are normal, disk availability is high. CPU usage is low (hovers around 25% shared between sqlserver.exe and DtsDebugHost.exe) Disk activity primarily on TempDB.mdf, but I/O is very low (< 600 Kb/s) OLE DB destination and SQL Server Destination both exhibit this problem. To sum it up, I expect either disk, CPU or RAM to be exhausted before the package slows down, but instead its as if the SSIS package is taking an afternoon nap. SQL server remains responsive to other queries, and I can't find any performance counters or logged events that betray the cause of the problem. I'll gratefully reward any reasonable answers / suggestions.

Read the article
( Sql Server 2005 C#.Net ) - I want just the insert query for a temp table.

- by John Stephen

Hi..I am using C#.Net and Sql Server ( Windows Application ). I had created a temporary table. When a button is clicked, temporary table (#tmp_emp_details) is created. I am having another button called "insert Values" and also 5 textboxes. The values that are entered in the textbox are used and whenever com.ExecuteNonQuery(); line comes, it throws an error message called "Invalid object name '#tbl_emp_answer'.". Below is the set of code..Please give me a solution. Code for insert (in insert value button): private void btninsertvalues_Click(object sender, EventArgs e) { username = txtusername.Text; examloginid = txtexamloginid.Text; question = txtquestion.Text; answer = txtanswer.Text; useranswer = txtanswer.Text; SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=tempdb;Integrated Security=True;"); SqlCommand com = new SqlCommand("Insert into #tbl_emp_answer values('"+username+"','"+examloginid+"','"+question+"','"+answer+"','"+useranswer+"')", con); con.Open(); com.ExecuteNonQuery(); con.Close(); }

Read the article
How to diagnose repeated "Starting up database '<dbname>'"

- by Richard Slater

I have a SQL 2008 server which is predominantly used as a development server, in the last two weeks it has been having occasional "fits", I have isolated the cause of these fits as CHECKDB being run almost continuiously, the following log information is logged to the Windows Event Log (Source: MSSQLSERVER, Category: Server): Event: 1073758961, Message: Starting up database 'DBName1'. Event: 1073758961, Message: Starting up database 'DBName2'. Event: 1073759397, Message: CHECKDB for database 'DBName1' finished without errors on 2010-07-19 20:29:26.993 (local time). This is an informational message only; no user action is required. Event: 1073759397, Message: CHECKDB for database 'DBName1' finished without errors on 2010-07-19 20:29:26.993 (local time). This is an informational message only; no user action is required. This is repeated every 1-2 seconds untill SQL Server is restarted or the offending databases are detatched. I initially thought that it was a problem with the databases so I took a backup and restored them to a SQL Express instance, all of the data is in tact, and CHECKDB runs without problem. The two databases that were causing a problem last week were not being used; so I took full backups of them and detached the databases, this resolved the problem. However at 0100 GMT this morning to other totally unrelated databases started showing the same problems. There is nothing in the event log to suggest that something happened to the server such as a restart, there are no messages about processes crashing or issues being detected with the storage controller. Speaking to the owner of the company this computer has suffered from "gremlins" in the past, however advice was taken and the motherboard was replaced and the computer rebuilt, memory and processor are the same. Stats: O/S: Windows 2008 Standard Build 6002 CPU: 2x Pentium Dual-Core E5200 @ 2.5GHz RAM: 2GB SQL: 2008 Standard 10.0.2531 Edit: someone posted then deleted a comment about AutoClose, it was turned on on the databases affected. It seems that best practice is to disable it so I have done that with the folllowing. EXECUTE sp_MSforeachdb 'IF (''?'' NOT IN (''master'', ''tempdb'', ''msdb'', ''model'')) EXECUTE (''ALTER DATABASE [?] SET AUTO_CLOSE OFF WITH NO_WAIT'')' I won't know if the problem recurs for some time so I am still open to further answers.

Read the article
SQL SERVER – Simple Example of Snapshot Isolation – Reduce the Blocking Transactions

- by pinaldave

To learn any technology and move to a more advanced level, it is very important to understand the fundamentals of the subject first. Today, we will be talking about something which has been quite introduced a long time ago but not properly explored when it comes to the isolation level. Snapshot Isolation was introduced in SQL Server in 2005. However, the reality is that there are still many software shops which are using the SQL Server 2000, and therefore cannot be able to maintain the Snapshot Isolation. Many software shops have upgraded to the later version of the SQL Server, but their respective developers have not spend enough time to upgrade themselves with the latest technology. “It works!” is a very common answer of many when they are asked about utilizing the new technology, instead of backward compatibility commands. In one of the recent consultation project, I had same experience when developers have “heard about it” but have no idea about snapshot isolation. They were thinking it is the same as Snapshot Replication – which is plain wrong. This is the same demo I am including here which I have created for them. In Snapshot Isolation, the updated row versions for each transaction are maintained in TempDB. Once a transaction has begun, it ignores all the newer rows inserted or updated in the table. Let us examine this example which shows the simple demonstration. This transaction works on optimistic concurrency model. Since reading a certain transaction does not block writing transaction, it also does not block the reading transaction, which reduced the blocking. First, enable database to work with Snapshot Isolation. Additionally, check the existing values in the table from HumanResources.Shift. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO Now, we will need two different sessions to prove this example. First Session: Set Transaction level isolation to snapshot and begin the transaction. Update the column “ModifiedDate” to today’s date. -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO Please note that we have not yet been committed to the transaction. Now, open the second session and run the following “SELECT” statement. Then, check the values of the table. Please pay attention on setting the Isolation level for the second one as “Snapshot” at the same time when we already start the transaction using BEGIN TRAN. -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values in the table are still original values. They have not been modified yet. Once again, go back to session 1 and begin the transaction. -- Session 1 COMMIT After that, go back to Session 2 and see the values of the table. -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values are yet not changed and they are still the same old values which were there right in the beginning of the session. Now, let us commit the transaction in the session 2. Once committed, run the same SELECT statement once more and see what the result is. -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that it now reflects the new updated value. I hope that this example is clear enough as it would give you good idea how the Snapshot Isolation level works. There is much more to write about an extra level, READ_COMMITTED_SNAPSHOT, which we will be discussing in another post soon. If you wish to use this transaction’s Isolation level in your production database, I would appreciate your comments about their performance on your servers. I have included here the complete script used in this example for your quick reference. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 COMMIT -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Transaction Isolation

Read the article
SQL SERVER – PAGEIOLATCH_DT, PAGEIOLATCH_EX, PAGEIOLATCH_KP, PAGEIOLATCH_SH, PAGEIOLATCH_UP – Wait Type – Day 9 of 28

- by pinaldave

It is very easy to say that you replace your hardware as that is not up to the mark. In reality, it is very difficult to implement. It is really hard to convince an infrastructure team to change any hardware because they are not performing at their best. I had a nightmare related to this issue in a deal with an infrastructure team as I suggested that they replace their faulty hardware. This is because they were initially not accepting the fact that it is the fault of their hardware. But it is really easy to say “Trust me, I am correct”, while it is equally important that you put some logical reasoning along with this statement. PAGEIOLATCH_XX is such a kind of those wait stats that we would directly like to blame on the underlying subsystem. Of course, most of the time, it is correct – the underlying subsystem is usually the problem. From Book On-Line: PAGEIOLATCH_DT Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Destroy mode. Long waits may indicate problems with the disk subsystem. PAGEIOLATCH_EX Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Exclusive mode. Long waits may indicate problems with the disk subsystem. PAGEIOLATCH_KP Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Keep mode. Long waits may indicate problems with the disk subsystem. PAGEIOLATCH_SH Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem. PAGEIOLATCH_UP Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Update mode. Long waits may indicate problems with the disk subsystem. PAGEIOLATCH_XX Explanation: Simply put, this particular wait type occurs when any of the tasks is waiting for data from the disk to move to the buffer cache. ReducingPAGEIOLATCH_XX wait: Just like any other wait type, this is again a very challenging and interesting subject to resolve. Here are a few things you can experiment on: Improve your IO subsystem speed (read the first paragraph of this article, if you have not read it, I repeat that it is easy to say a step like this than to actually implement or do it). This type of wait stats can also happen due to memory pressure or any other memory issues. Putting aside the issue of a faulty IO subsystem, this wait type warrants proper analysis of the memory counters. If due to any reasons, the memory is not optimal and unable to receive the IO data. This situation can create this kind of wait type. Proper placing of files is very important. We should check file system for the proper placement of files – LDF and MDF on separate drive, TempDB on separate drive, hot spot tables on separate filegroup (and on separate disk), etc. Check the File Statistics and see if there is higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats. It is very possible that there are no proper indexes on the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can significantly reduce lots of CPU, Memory and IO (considering cover index has much lesser columns than cluster table and all other it depends conditions). You can refer to the two articles’ links below previously written by me that talk about how to optimize indexes. Create Missing Indexes Drop Unused Indexes Updating statistics can help the Query Optimizer to render optimal plan, which can only be either directly or indirectly. I have seen that updating statistics with full scan (again, if your database is huge and you cannot do this – never mind!) can provide optimal information to SQL Server optimizer leading to efficient plan. Checking Memory Related Perfmon Counters SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2) SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark) SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system) SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds) Memory: Available Mbytes (Information only) Memory: Page Faults/sec (Benchmark only) Memory: Pages/sec (Benchmark only) Checking Disk Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All of the discussions of Wait Stats in this blog is generic and varies from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article
SQL SERVER – Simple Example of Snapshot Isolation – Reduce the Blocking Transactions

- by pinaldave

To learn any technology and move to a more advanced level, it is very important to understand the fundamentals of the subject first. Today, we will be talking about something which has been quite introduced a long time ago but not properly explored when it comes to the isolation level. Snapshot Isolation was introduced in SQL Server in 2005. However, the reality is that there are still many software shops which are using the SQL Server 2000, and therefore cannot be able to maintain the Snapshot Isolation. Many software shops have upgraded to the later version of the SQL Server, but their respective developers have not spend enough time to upgrade themselves with the latest technology. “It works!” is a very common answer of many when they are asked about utilizing the new technology, instead of backward compatibility commands. In one of the recent consultation project, I had same experience when developers have “heard about it” but have no idea about snapshot isolation. They were thinking it is the same as Snapshot Replication – which is plain wrong. This is the same demo I am including here which I have created for them. In Snapshot Isolation, the updated row versions for each transaction are maintained in TempDB. Once a transaction has begun, it ignores all the newer rows inserted or updated in the table. Let us examine this example which shows the simple demonstration. This transaction works on optimistic concurrency model. Since reading a certain transaction does not block writing transaction, it also does not block the reading transaction, which reduced the blocking. First, enable database to work with Snapshot Isolation. Additionally, check the existing values in the table from HumanResources.Shift. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO Now, we will need two different sessions to prove this example. First Session: Set Transaction level isolation to snapshot and begin the transaction. Update the column “ModifiedDate” to today’s date. -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO Please note that we have not yet been committed to the transaction. Now, open the second session and run the following “SELECT” statement. Then, check the values of the table. Please pay attention on setting the Isolation level for the second one as “Snapshot” at the same time when we already start the transaction using BEGIN TRAN. -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values in the table are still original values. They have not been modified yet. Once again, go back to session 1 and begin the transaction. -- Session 1 COMMIT After that, go back to Session 2 and see the values of the table. -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values are yet not changed and they are still the same old values which were there right in the beginning of the session. Now, let us commit the transaction in the session 2. Once committed, run the same SELECT statement once more and see what the result is. -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that it now reflects the new updated value. I hope that this example is clear enough as it would give you good idea how the Snapshot Isolation level works. There is much more to write about an extra level, READ_COMMITTED_SNAPSHOT, which we will be discussing in another post soon. If you wish to use this transaction’s Isolation level in your production database, I would appreciate your comments about their performance on your servers. I have included here the complete script used in this example for your quick reference. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 COMMIT -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Transaction Isolation

Read the article
Coping with infrastructure upgrades

- by Fatherjack

A common topic for questions on SQL Server forums is how to plan and implement upgrades to SQL Server. Moving from old to new hardware or moving from one version of SQL Server to another. There are other circumstances where upgrades of other systems affect SQL Server DBAs. For example, where I work at the moment there is an Microsoft Exchange (email) server upgrade in progress. It it being handled by a different team so I’m not wholly sure on the details but we are in a situation where there are currently 2 Exchange email servers – the old one and the new one. Users mail boxes are being transferred in a planned process but as we approach the old server being turned off we have to also make sure that our SQL Servers get updated to use the new SMTP server for all of the SQL Agent notifications, SSIS packages etc. My servers have a number of profiles so that various jobs can send emails on behalf of various departments and different systems. This means there are lots of places that the old server name needs to be replaced by the new one. Anyone who has set up DBMail and enjoyed the click-tastic odyssey of screens to create Profiles and Accounts and so on and so forth ought to seek some professional help in my opinion. It’s a nightmare of back and forth settings changes and it stinks. I wasn’t looking forward to heading into this mess of a UI and changing the old Exchange server name for the new one on all my SQL Instances for all of the accounts I have set up. So I did what any Englishmen with a shed would do, I decided to take it apart and see if I can fix it another way. I took a guess that we are going to be working in MSDB and Books OnLine was remarkably helpful and amongst a lot of information told me about a couple of procedures that can be used to interrogate DBMail settings. USE [msdb] -- It's where all the good stuff is kept GO EXEC dbo.sysmail_help_profile_sp; EXEC dbo.sysmail_help_account_sp; Both of these procedures take optional parameters with the same name – ID and Name. If you provide an ID or a name then the results you get back are for that specific Profile or Account. Otherwise you get details of all Profiles and Accounts on the server you are connected to. As you can see (click for a bigger image), the Account has the SMTP server information in the servername column. We want to change that value to NewSMTP.Contoso.com. Now it appears that the procedure we are looking at gets it’s data from the sysmail_account and sysmail_server tables, you can get the results the stored procedure provides if you run the code below. SELECT [account_id] , [name] , [description] , [email_address] , [display_name] , [replyto_address] , [last_mod_datetime] , [last_mod_user] FROM dbo.sysmail_account AS sa; SELECT [account_id] , [servertype] , [servername] , [port] , [username] , [credential_id] , [use_default_credentials] , [enable_ssl] , [flags] , [last_mod_datetime] , [last_mod_user] , [timeout] FROM dbo.sysmail_server AS sms Now, we have no real idea how these tables are linked and whether making an update direct to one or other of them is going to do what we want or whether it will entirely cripple our ability to send email from SQL Server so we wont touch those tables with any UPDATE TSQL. So, back to Books OnLine then and we find sysmail_update_account_sp. It’s exactly what we need. The examples in BOL take the form (as below) of having every parameter explicitly defined. Not wanting to totally obliterate the existing values by not passing values in all of the parameters I set to writing some code to gather the existing data from the tables and re-write the SMTP server name and then execute the resulting TSQL. IF OBJECT_ID('tempdb..#sysmailprofiles') IS NOT NULL DROP TABLE #sysmailprofiles GO CREATE TABLE #sysmailprofiles ( account_id INT , [name] VARCHAR(50) , [description] VARCHAR(500) , email_address VARCHAR(500) , display_name VARCHAR(500) , replyto_address VARCHAR(500) , servertype VARCHAR(10) , servername VARCHAR(100) , port INT , username VARCHAR(100) , use_default_credentials VARCHAR(1) , ENABLE_ssl VARCHAR(1) ) INSERT [#sysmailprofiles] ( [account_id] , [name] , [description] , [email_address] , [display_name] , [replyto_address] , [servertype] , [servername] , [port] , [username] , [use_default_credentials] , [ENABLE_ssl] ) EXEC [dbo].[sysmail_help_account_sp] DECLARE @TSQL NVARCHAR(1000) SELECT TOP 1 @TSQL = 'EXEC [dbo].[sysmail_update_account_sp] @account_id = ' + CAST([s].[account_id] AS VARCHAR(20)) + ', @account_name = ''' + [s].[name] + '''' + ', @email_address = N''' + [s].[email_address] + '''' + ', @display_name = N''' + [s].[display_name] + '''' + ', @replyto_address = N''' + s.replyto_address + '''' + ', @description = N''' + [s].[description] + '''' + ', @mailserver_name = ''NEWSMTP.contoso.com''' + +', @mailserver_type = ' + [s].[servertype] + ', @port = ' + CAST([s].[port] AS VARCHAR(20)) + ', @username = ' + COALESCE([s].[username], '''''') + ', @use_default_credentials =' + CAST(s.[use_default_credentials] AS VARCHAR(1)) + ', @enable_ssl =' + [s].[ENABLE_ssl] FROM [#sysmailprofiles] AS s WHERE [s].[servername] = 'SMTP.Contoso.com' SELECT @tsql EXEC [sys].[sp_executesql] @tsql This worked well for me and testing the email function EXEC dbo.sp_send_dbmail afterwards showed that the settings were indeed using our new Exchange server. It was only later in writing this blog that I tried running the sysmail_update_account_sp procedure with only the SMTP server name parameter value specified. Despite what Books OnLine might intimate, you can do this and only the values for parameters specified get changed. If a parameter is not specified in the execution of the procedure then the values remain unchanged. This renders most of the above script unnecessary as I could have simply specified the account_id that I want to amend and the new value for the parameter I want to update. EXEC sysmail_update_account_sp @account_id = 1, @mailserver_name = 'NEWSMTP.Contoso.com' This wasn’t going to be the main reason for this post, it was meant to describe how to capture values from a stored procedure and use them in dynamic TSQL but instead we are here and (re)learning the fact that Books Online is a little flawed in places. It is a fantastic resource for anyone working with SQL Server but the reader must adopt an enquiring frame of mind and use a little curiosity to try simple variations on examples to fully understand the code you are working with. I think the author(s) of this part of Books OnLine missed an opportunity to include a third example that had fewer than all parameters specified to give a lead to this method existing.

Read the article
Database Owner Conundrum

- by Johnm

Have you ever restored a database from a production environment on Server A into a development environment on Server B and had some items, such as Service Broker, mysteriously cease functioning? You might want to consider reviewing the database owner property of the database. The Scenario Recently, I was developing some messaging functionality that utilized the Service Broker feature of SQL Server in a development environment. Within the instance of the development environment resided two databases: One was a restored version of a production database that we will call "RestoreDB". The second database was a brand new database that has yet to exist in the production environment that we will call "DevDB". The goal is to setup a communication path between RestoreDB and DevDB that will later be implemented into the production database. After implementing all of the Service Broker objects that are required to communicate within a database as well as between two databases on the same instance I found myself a bit confounded. My testing was showing that the communication was successful when it was occurring internally within DevDB; but the communication between RestoreDB and DevDB did not appear to be working. Profiler to the rescue After carefully reviewing my code for any misspellings, missing commas or any other minor items that might be a syntactical cause of failure, I decided to launch Profiler to aid in the troubleshooting. After simulating the cross database messaging, I noticed the following error appearing in Profiler: An exception occurred while enqueueing a message in the target queue. Error: 33009, State: 2. The database owner SID recorded in the master database differs from the database owner SID recorded in database '[Database Name Here]'. You should correct this situation by resetting the owner of database '[Database Name Here]' using the ALTER AUTHORIZATION statement. Now, this error message is a helpful one. Not only does it identify the issue in plain language, it also provides a potential solution. An execution of the following query that utilizes the catalog view sys.transmission_queue revealed the same error message for each communication attempt: SELECT * FROM sys.transmission_queue; Seeing the situation as a learning opportunity I dove a bit deeper. Reviewing the database properties The owner of a specific database can be easily viewed by right-clicking the database in SQL Server Management Studio and selecting the "properties" option. The owner is listed on the "General" page of the properties screen. In my scenario, the database in the production server was created by Frank the DBA; therefore his server login appeared as the owner: "ServerName\Frank". While this is interesting information, it certainly doesn't tell me much in regard to the SID (security identifier) and its existence, or lack thereof, in the master database as the error suggested. I pulled together the following query to gather more interesting information: SELECT a.name , a.owner_sid , b.sid , b.name , b.type_desc FROM master.sys.databases a LEFT OUTER JOIN master.sys.server_principals b ON a.owner_sid = b.sid WHERE a.name not in ('master','tempdb','model','msdb'); This query also helped identify how many other user databases in the instance were experiencing the same issue. In this scenario, I saw that there were no matching SIDs in server_principals to the owner SID for my database. What login should be used as the database owner instead of Frank's? The system stored procedure sp_helplogins will provide a list of the valid logins that can be used. Here is an example of its use, revealing all available logins: EXEC sp_helplogins; Fixing a hole The error message stated that the recommended solution was to execute the ALTER AUTHORIZATION statement. The full statement for this scenario would appear as follows: ALTER AUTHORIZATION ON DATABASE:: [Database Name Here] TO [Login Name]; Another option is to execute the following statement using the sp_changedbowner system stored procedure; but please keep in mind that this stored procedure has been deprecated and will likely disappear in future versions of SQL Server: EXEC dbo.sp_changedbowner @loginname = [Login Name]; .And They Lived Happily Ever After Upon changing the database owner to an existing login and simulating the inner and cross database messaging the errors have ceased. More importantly, all messages sent through this feature now successfully complete their journey. I have added the ownership change to my restoration script for the development environment.

Read the article
Disaster Recovery Discovery

- by Rodney Landrum

Last weekend I joined several of my IT staff on a mission to perform a DR test in our remote CoLo center in a large South East city of the US. Can I be more obtuse? The goal was simple for me as the sole DBA in a throng of Windows, Storage, Network and SAN admins – restore the databases and make them work. There were 4 applications that back ended to 7 SQL Server databases on 4 different SQL Server instances. We would maintain the original server names, but beyond that it was fair game. We had time to prepare so I was able to script out or otherwise automate the recovery process. I used sp_help_revlogin for three of the servers, a bit of a cheat actually because restoring the Master database on the target DR servers was the specified course of action according to the DR procedures ( the caveat “IF REQUIRED” left it open to interpretation. I really wanted to avoid the step of restoring Master for a number of reasons but mainly because I did not want to deal with issues starting SQL Services afterward. Having to account for the location of TempDB and the version conflicts of the resource DBs were just two of the battles I chose not to fight. Not to mention other system database location problems that might arise and prevent SQL from starting. I was going to have to restore all of the user databases anyway, so I would not really gain any benefit, outside of logins, for taking the time to restore the source Master database over the newly installed one on the fresh server. What I wanted was the ability to restore the Master database as a user database, call it Master_Mine, from a backup on the source system and then use that restored database to script the SQL Logins and passwords on the DR systems. While I did not attempt this on the trip, the thought stuck in my mind and this past week I succeeded at scripting user accounts and passwords using only a restored copy of the Master database. Granted there were several challenges to overcome. Also, as is usual for any work like this the usual disclaimers apply: This is not something that I would imagine Microsoft would condone or support and this was really only an experiment for me to learn if it was even possible. While I have tested the process with success, I do not know that I would use this technique in a documented procedure because future updates for SQL Server will render this technique non-functional. I thought at first, incorrectly of course, that I could use sp_help_revlogin on a restored copy of the master database I named Master_Mine. Since sp_help_revlogin uses system schema objects, sys.syslogins and sys.server_principals, this was not going to work because all results would come from the main Master database. To test this I added a SQL login via SSMS, backed up Master, restored it as Master_Mine, and then deleted the login. Even though the test account I created should presumably still be in the Master_Mine database, I should be able to get to it and script out its creation with its password hash so that I would not need to know the password, but any applications that stored that password would not have to be altered in the DR scenario. They would just work as expected. Once I realized that would not work I began looking deeper. Knowing that sys.syslogins and sys.server_principals are system views, their underlying code should be available with sp_helptext, right? They were. And this led me to discover the two tables sys.sysxlgns and sys.sysprivs, where the data I needed was stored. These tables existed in both the real Master and the restored copy, Master_Mine. I used this information to tweak the sp_help_revlogin stored procedure to use these tables instead to create the logins cursor used in sp_help_revlogin. For the password hash, sp_help_revlogin uses the function LoginProperty() which takes a user name and option ‘passwordhash’ to return the hash for the user. Unfortunately, it requires the login to exist in the Master database. This would not work. So another slight modification I had to make was to pull the password hash itself (pwdhash from sys.sysxlgns) into the logins cursor and comment out the section of sp_help_revlogin that uses LoginProperty. Instead, I pass the pwdhash value as the variable @PWD_varbinary to the sp_hexadecimal stored procedure which is also created by and used within the code provided by Microsoft in the link above for sp_help_revlogin. The final challenge: sys.sysxlgns and sys.server_principals are visible only within a Dedicated Administrator Connection (DAC) query window in SSMS or within SQLCDMD. To open a DAC connection you have to be logged in on the SQL Server itself, via RDP in my case, and you preface the server name in the query connection with ADMIN:, so that the server connection looks like ADMIN:ServerName. From there you can create the modified stored procedure in the restored copy of a Master database from a source system as whatever name you like, and then run the modified stored procedure. I named my new stored procedure usp_help_revlogin_MyMaster. Upon execution I was happy to see the logins and password hashes that I needed to apply from the source Master database without having to restore over the new Master system database and without the need to access the original server (assuming it was down due to whatever disaster put it in that state). You will note that I am not providing full code samples here of the modifications. I will say that it was a slight bit of work and anyone who needed to do this for whatever reason, could fairly easily roll their own solution with the information provided herein. My goal, as I said was to prove that this could be done and provide another option if required to ease the burden of getting SQL Servers up and available in an emergency situation where alternatives may be more challenging or otherwise unavailable.

Read the article
Is READ UNCOMMITTED / NOLOCK safe in this situation?

- by Ben Challenor

I know that snapshot isolation would fix this problem, but I'm wondering if NOLOCK is safe in this specific case so that I can avoid the overhead. I have a table that looks something like this: drop table Data create table Data ( Id BIGINT NOT NULL, Date BIGINT NOT NULL, Value BIGINT, constraint Cx primary key (Date, Id) ) create nonclustered index Ix on Data (Id, Date) There are no updates to the table, ever. Deletes can occur but they should never contend with the SELECT because they affect the other, older end of the table. Inserts are regular and page splits to the (Id, Date) index are extremely common. I have a deadlock situation between a standard INSERT and a SELECT that looks like this: select top 1 Date, Value from Data where Id = @p0 order by Date desc because the INSERT acquires a lock on Cx (Date, Id; Value) and then Ix (Id, Date), but the SELECT acquires a lock on Ix (Id, Date) and then Cx (Date, Id; Value). This is because the SELECT first seeks on Ix and then joins to a seek on Cx. Swapping the clustered and non-clustered index would break this cycle, but it is not an acceptable solution because it would introduce cycles with other (more complex) SELECTs. If I add NOLOCK to the SELECT, can it go wrong in this case? Can it return: More than one row, even though I asked for TOP 1? No rows, even though one exists and has been committed? Worst of all, a row that doesn't satisfy the WHERE clause? I've done a lot of reading about this online, but the only reproductions of over- or under-count anomalies I've seen (one, two) involve a scan. This involves only seeks. Jeff Atwood has a post about using NOLOCK that generated a good discussion. I was particularly interested in a comment by Rick Townsend: Secondly, if you read dirty data, the risk you run is of reading the entirely wrong row. For example, if your select reads an index to find your row, then the update changes the location of the rows (e.g.: due to a page split or an update to the clustered index), when your select goes to read the actual data row, it's either no longer there, or a different row altogether! Is this possible with inserts only, and no updates? If so, then I guess even my seeks on an insert-only table could be dangerous. Update: I'm trying to figure out how snapshot isolation works. It seems to be row-based, where transactions read the table (with no shared lock!), find the row they are interested in, and then see if they need to get an old version of the row from the version store in tempdb. But in my case, no row will have more than one version, so the version store seems rather pointless. And if the row was found with no shared lock, how is it different to just using NOLOCK?

Read the article
How to list all duplicated rows which may include NULL columns?

- by Yousui

Hi guys, I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first. USE [tempdb]; GO IF OBJECT_ID(N'dbo.t') IS NOT NULL BEGIN DROP TABLE dbo.t END GO CREATE TABLE dbo.t ( a NVARCHAR(8), b NVARCHAR(8) ); GO INSERT t VALUES ('a', 'b'); INSERT t VALUES ('a', 'b'); INSERT t VALUES ('a', 'b'); INSERT t VALUES ('c', 'd'); INSERT t VALUES ('c', 'd'); INSERT t VALUES ('c', 'd'); INSERT t VALUES ('c', 'd'); INSERT t VALUES ('e', NULL); INSERT t VALUES (NULL, NULL); INSERT t VALUES (NULL, NULL); INSERT t VALUES (NULL, NULL); INSERT t VALUES (NULL, NULL); GO Now I want to show all rows that have other rows duplicated with them, I use the following query. SELECT a, b FROM dbo.t GROUP BY a, b HAVING count(*) > 1 which will give us the result: a b -------- -------- NULL NULL a b c d Now if I want to list all rows that make contribution to duplication, I use this query: WITH duplicate (a, b) AS ( SELECT a, b FROM dbo.t GROUP BY a, b HAVING count(*) > 1 ) SELECT dbo.t.a, dbo.t.b FROM dbo.t INNER JOIN duplicate ON (dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b) Which will give me the result: a b -------- -------- a b a b a b c d c d c d c d As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to WITH duplicate (a, b) AS ( SELECT a, b FROM dbo.t GROUP BY a, b HAVING count(*) > 1 ) SELECT dbo.t.a, dbo.t.b FROM dbo.t INNER JOIN duplicate ON (dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b) OR (dbo.t.a IS NULL AND duplicate.a IS NULL AND dbo.t.b = duplicate.b) OR (dbo.t.b IS NULL AND duplicate.b IS NULL AND dbo.t.a = duplicate.a) OR (dbo.t.a IS NULL AND duplicate.a IS NULL AND dbo.t.b IS NULL AND duplicate.b IS NULL) And this query will give me the answer as I wanted: a b -------- -------- NULL NULL NULL NULL NULL NULL NULL NULL a b a b a b c d c d c d c d Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem? Great thanks.

Read the article
SQL SERVER – Shrinking Database is Bad – Increases Fragmentation – Reduces Performance

- by pinaldave

Earlier, I had written two articles related to Shrinking Database. I wrote about why Shrinking Database is not good. SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server SQL SERVER – What the Business Says Is Not What the Business Wants I received many comments on Why Database Shrinking is bad. Today we will go over a very interesting example that I have created for the same. Here are the quick steps of the example. Create a test database Create two tables and populate with data Check the size of both the tables Size of database is very low Check the Fragmentation of one table Fragmentation will be very low Truncate another table Check the size of the table Check the fragmentation of the one table Fragmentation will be very low SHRINK Database Check the size of the table Check the fragmentation of the one table Fragmentation will be very HIGH REBUILD index on one table Check the size of the table Size of database is very HIGH Check the fragmentation of the one table Fragmentation will be very low Here is the script for the same. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO Let us check the table size and fragmentation. Now let us TRUNCATE the table and check the size and Fragmentation. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can clearly see that after TRUNCATE, the size of the database is not reduced and it is still the same as before TRUNCATE operation. After the Shrinking database operation, we were able to reduce the size of the database. If you notice the fragmentation, it is considerably high. The major problem with the Shrink operation is that it increases fragmentation of the database to very high value. Higher fragmentation reduces the performance of the database as reading from that particular table becomes very expensive. One of the ways to reduce the fragmentation is to rebuild index on the database. Let us rebuild the index and observe fragmentation and database size. -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REBUILD GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can notice that after rebuilding, Fragmentation reduces to a very low value (almost same to original value); however the database size increases way higher than the original. Before rebuilding, the size of the database was 5 MB, and after rebuilding, it is around 20 MB. Regular rebuilding the index is rebuild in the same user database where the index is placed. This usually increases the size of the database. Look at irony of the Shrinking database. One person shrinks the database to gain space (thinking it will help performance), which leads to increase in fragmentation (reducing performance). To reduce the fragmentation, one rebuilds index, which leads to size of the database to increase way more than the original size of the database (before shrinking). Well, by Shrinking, one did not gain what he was looking for usually. Rebuild indexing is not the best suggestion as that will create database grow again. I have always remembered the excellent post from Paul Randal regarding Shrinking the database is bad. I suggest every one to read that for accuracy and interesting conversation. Let us run following script where we Shrink the database and REORGANIZE. -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Shrink the Database DBCC SHRINKDATABASE (ShrinkIsBed); GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REORGANIZE GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can see that REORGANIZE does not increase the size of the database or remove the fragmentation. Again, I no way suggest that REORGANIZE is the solution over here. This is purely observation using demo. Read the blog post of Paul Randal. Following script will clean up the database -- Clean up USE MASTER GO ALTER DATABASE ShrinkIsBed SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE ShrinkIsBed GO There are few valid cases of the Shrinking database as well, but that is not covered in this blog post. We will cover that area some other time in future. Additionally, one can rebuild index in the tempdb as well, and we will also talk about the same in future. Brent has written a good summary blog post as well. Are you Shrinking your database? Well, when are you going to stop Shrinking it? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – Weekly Series – Memory Lane – #035

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Row Overflow Data Explanation In SQL Server 2005 one table row can contain more than one varchar(8000) fields. One more thing, the exclusions has exclusions also the limit of each individual column max width of 8000 bytes does not apply to varchar(max), nvarchar(max), varbinary(max), text, image or xml data type columns. Comparison Index Fragmentation, Index De-Fragmentation, Index Rebuild – SQL SERVER 2000 and SQL SERVER 2005 An old but like a gold article. Talks about lots of concepts related to Index and the difference from earlier version to the newer version. I strongly suggest that everyone should read this article just to understand how SQL Server has moved forward with the technology. Improvements in TempDB SQL Server 2005 had come up with quite a lots of improvements and this blog post describes them and explains the same. If you ask me what is my the most favorite article from early career. I must point out to this article as when I wrote this one I personally have learned a lot of new things. Recompile All The Stored Procedure on Specific TableI prefer to recompile all the stored procedure on the table, which has faced mass insert or update. sp_recompiles marks stored procedures to recompile when they execute next time. This blog post explains the same with the help of a script. 2008 SQLAuthority Download – SQL Server Cheatsheet You can download and print this cheat sheet and use it for your personal reference. If you have any suggestions, please let me know and I will see if I can update this SQL Server cheat sheet. Difference Between DBMS and RDBMS What is the difference between DBMS and RDBMS? DBMS – Data Base Management System RDBMS – Relational Data Base Management System or Relational DBMS High Availability – Hot Add Memory Hot Add CPU and Hot Add Memory are extremely interesting features of the SQL Server, however, personally I have not witness them heavily used. These features also have few restriction as well. I blogged about them in detail. 2009 Delete Duplicate Rows I have demonstrated in this blog post how one can identify and delete duplicate rows. Interesting Observation of Logon Trigger On All Servers – Solution The question I put forth in my previous article was – In single login why the trigger fires multiple times; it should be fired only once. I received numerous answers in thread as well as in my MVP private news group. Now, let us discuss the answer for the same. The answer is – It happens because multiple SQL Server services are running as well as intellisense is turned on. Blog post demonstrates how we can do the same with the help of SQL scripts. Management Studio New Features I have selected my favorite 5 features and blogged about it. IntelliSense for Query Editing Multi Server Query Query Editor Regions Object Explorer Enhancements Activity Monitors Maximum Number of Index per Table One of the questions I asked in my user group was – What is the maximum number of Index per table? I received lots of answers to this question but only two answers are correct. Let us now take a look at them in this blog post. 2010 Default Statistics on Column – Automatic Statistics on Column The truth is, Statistics can be in a table even though there is no Index in it. If you have the auto- create and/or auto-update Statistics feature turned on for SQL Server database, Statistics will be automatically created on the Column based on a few conditions. Please read my previously posted article, SQL SERVER – When are Statistics Updated – What triggers Statistics to Update, for the specific conditions when Statistics is updated. 2011 T-SQL Scripts to Find Maximum between Two Numbers In this blog post there are two different scripts listed which demonstrates way to find the maximum number between two numbers. I need your help, which one of the script do you think is the most accurate way to find maximum number? Find Details for Statistics of Whole Database – DMV – T-SQL Script I was recently asked is there a single script which can provide all the necessary details about statistics for any database. This question made me write following script. I was initially planning to use sp_helpstats command but I remembered that this is marked to be deprecated in future. 2012 Introduction to Function SIGN SIGN Function is very fundamental function. It will return the value 1, -1 or 0. If your value is negative it will return you negative -1 and if it is positive it will return you positive +1. Let us start with a simple small example. Template Browser – A Very Important and Useful Feature of SSMS Templates are like a quick cheat sheet or quick reference. Templates are available to create objects like databases, tables, views, indexes, stored procedures, triggers, statistics, and functions. Templates are also available for Analysis Services as well. The template scripts contain parameters to help you customize the code. You can Replace Template Parameters dialog box to insert values into the script. An invalid floating point operation occurred If you run any of the above functions they will give you an error related to invalid floating point. Honestly there is no workaround except passing the function appropriate values. SQRT of a negative number will give you result in real numbers which is not supported at this point of time as well LOG of a negative number is not possible (because logarithm is the inverse function of an exponential function and the exponential function is NEVER negative). Validating Spatial Object with IsValidDetailed Function SQL Server 2012 has introduced the new function IsValidDetailed(). This function has made my life very easy. In simple words, this function will check if the spatial object passed is valid or not. If it is valid it will give information that it is valid. If the spatial object is not valid it will return the answer that it is not valid and the reason for the same. This makes it very easy to debug the issue and make the necessary correction. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Maintenance plans love story

- by Maria Zakourdaev

There are about 200 QA and DEV SQL Servers out there. There is a maintenance plan on many of them that performs a backup of all databases and removes the backup history files. First of all, I must admit that I’m no big fan of maintenance plans in particular or the SSIS packages in general. In this specific case, if I ever need to change anything in the way backup is performed, such as the compression feature or perform some other change, I have to open each plan one by one. This is quite a pain. Therefore, I have decided to replace the maintenance plans with a stored procedure that will perform exactly the same thing. Having such a procedure will allow me to open multiple server connections and just execute an ALTER PROCEDURE whenever I need to change anything in it. There is nothing like good ole T-SQL. The first challenge was to remove the unneeded maintenance plans. Of course, I didn’t want to do it server by server. I found the procedure msdb.dbo.sp_maintplan_delete_plan, but it only has a parameter for the maintenance plan id and it has no other parameters, like plan name, which would have been much more useful. Now I needed to find the table that holds all maintenance plans on the server. You would think that it would be msdb.dbo.sysdbmaintplans but, unfortunately, regardless of the number of maintenance plans on the instance, it contains just one row. After a while I found another table: msdb.dbo.sysmaintplan_subplans. It contains the plan id that I was looking for, in the plan_id column and well as the agent’s job id which is executing the plan’s package: That was all I needed and the rest turned out to be quite easy. Here is a script that can be executed against hundreds of servers from a multi-server query window to drop the specific maintenance plans. DECLARE @PlanID uniqueidentifier SELECT @PlanID = plan_id FROM msdb.dbo.sysmaintplan_subplans Where name like ‘BackupPlan%’ EXECUTE msdb.dbo.sp_maintplan_delete_plan @plan_id=@PlanID The second step was to create a procedure that will perform all of the old maintenance plan tasks: create a folder for each database, backup all databases on the server and clean up the old files. The script is below. Enjoy. ALTER PROCEDURE BackupAllDatabases @PrintMode BIT = 1 AS BEGIN DECLARE @BackupLocation VARCHAR(500) DECLARE @PurgeAferDays INT DECLARE @PurgingDate VARCHAR(30) DECLARE @SQLCmd VARCHAR(MAX) DECLARE @FileName VARCHAR(100) SET @PurgeAferDays = -14 SET @BackupLocation = '\\central_storage_servername\BACKUPS\'+@@servername SET @PurgingDate = CONVERT(VARCHAR(19), DATEADD (dd,@PurgeAferDays,GETDATE()),126) SET @FileName = '?_full_'+ + REPLACE(CONVERT(VARCHAR(19), GETDATE(),126),':','-') +'.bak'; SET @SQLCmd = ' IF ''?'' <> ''tempdb'' BEGIN EXECUTE master.dbo.xp_create_subdir N'''+@BackupLocation+'\?\'' ; BACKUP DATABASE ? TO DISK = N'''+@BackupLocation+'\?\'+@FileName+''' WITH NOFORMAT, NOINIT, SKIP, REWIND, NOUNLOAD, COMPRESSION, STATS = 10 ; EXECUTE master.dbo.xp_delete_file 0,N'''+@BackupLocation+'\?\'',N''bak'',N'''+@PurgingDate+''',1; END' IF @PrintMode = 1 BEGIN PRINT @SQLCmd END EXEC sp_MSforeachdb @SQLCmd END

Read the article
Heaps of Trouble?

- by Paul White NZ

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material. In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all. The problem is, predictably, that performance is not very good. The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found. Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables. A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings). The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan. The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate. Every join in the plan shown above estimates that it will produce just a single row as output. Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows. It should not surprise you that this has rather dire consequences for the remainder of the query plan. In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each. The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints. A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query. The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement! As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there. (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B). Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear. The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools. If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on. Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID. The term ‘eager’ means that the spool consumes all of its input rows when it starts up. The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index. From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table. The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself. The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation. Nested loops join preserves the order of rows on its outer input, so sorting early is safe. (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input. It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation. It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins. You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins. The answer, again, is down to the poor information available to the optimizer. Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs. SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk. The join process then continues, and may again run out of memory. This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow. The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion. You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this). Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it? We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs. The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk. Even so, the query runs to completion in three or four seconds. That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information. We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world. It’s there primarily for its educational and entertainment value. I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details. My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
Option Trading: Getting the most out of the event session options

- by extended_events

You can control different aspects of how an event session behaves by setting the event session options as part of the CREATE EVENT SESSION DDL. The default settings for the event session options are designed to handle most of the common event collection situations so I generally recommend that you just use the defaults. Like everything in the real world though, there are going to be a handful of “special cases” that require something different. This post focuses on identifying the special cases and the correct use of the options to accommodate those cases. There is a reason it’s called Default The default session options specify a total event buffer size of 4 MB with a 30 second latency. Translating this into human terms; this means that our default behavior is that the system will start processing events from the event buffer when we reach about 1.3 MB of events or after 30 seconds, which ever comes first. Aside: What’s up with the 1.3 MB, I thought you said the buffer was 4 MB?The Extended Events engine takes the total buffer size specified by MAX_MEMORY (4MB by default) and divides it into 3 equally sized buffers. This is done so that a session can be publishing events to one buffer while other buffers are being processed. There are always at least three buffers; how to get more than three is covered later. Using this configuration, the Extended Events engine can “keep up” with most event sessions on standard workloads. Why is this? The fact is that most events are small, really small; on the order of a couple hundred bytes. Even when you start considering events that carry dynamically sized data (eg. binary, text, etc.) or adding actions that collect additional data, the total size of the event is still likely to be pretty small. This means that each buffer can likely hold thousands of events before it has to be processed. When the event buffers are finally processed there is an economy of scale achieved since most targets support bulk processing of the events so they are processed at the buffer level rather than the individual event level. When all this is working together it’s more likely that a full buffer will be processed and put back into the ready queue before the remaining buffers (remember, there are at least three) are full. I know what you’re going to say: “My server is exceptional! My workload is so massive it defies categorization!” OK, maybe you weren’t going to say that exactly, but you were probably thinking it. The point is that there are situations that won’t be covered by the Default, but that’s a good place to start and this post assumes you’ve started there so that you have something to look at in order to determine if you do have a special case that needs different settings. So let’s get to the special cases… What event just fired?! How about now?! Now?! If you believe the commercial adage from Heinz Ketchup (Heinz Slow Good Ketchup ad on You Tube), some things are worth the wait. This is not a belief held by most DBAs, particularly DBAs who are looking for an answer to a troubleshooting question fast. If you’re one of these anxious DBAs, or maybe just a Program Manager doing a demo, then 30 seconds might be longer than you’re comfortable waiting. If you find yourself in this situation then consider changing the MAX_DISPATCH_LATENCY option for your event session. This option will force the event buffers to be processed based on your time schedule. This option only makes sense for the asynchronous targets since those are the ones where we allow events to build up in the event buffer – if you’re using one of the synchronous targets this option isn’t relevant. Avoid forgotten events by increasing your memory Have you ever had one of those days where you keep forgetting things? That can happen in Extended Events too; we call it dropped events. In order to optimizes for server performance and help ensure that the Extended Events doesn’t block the server if to drop events that can’t be published to a buffer because the buffer is full. You can determine if events are being dropped from a session by querying the dm_xe_sessions DMV and looking at the dropped_event_count field. Aside: Should you care if you’re dropping events?Maybe not – think about why you’re collecting data in the first place and whether you’re really going to miss a few dropped events. For example, if you’re collecting query duration stats over thousands of executions of a query it won’t make a huge difference to miss a couple executions. Use your best judgment. If you find that your session is dropping events it means that the event buffer is not large enough to handle the volume of events that are being published. There are two ways to address this problem. First, you could collect fewer events – examine you session to see if you are over collecting. Do you need all the actions you’ve specified? Could you apply a predicate to be more specific about when you fire the event? Assuming the session is defined correctly, the next option is to change the MAX_MEMORY option to a larger number. Picking the right event buffer size might take some trial and error, but a good place to start is with the number of dropped events compared to the number you’ve collected. Aside: There are three different behaviors for dropping events that you specify using the EVENT_RETENTION_MODE option. The default is to allow single event loss and you should stick with this setting since it is the best choice for keeping the impact on server performance low.You’ll be tempted to use the setting to not lose any events (NO_EVENT_LOSS) – resist this urge since it can result in blocking on the server. If you’re worried that you’re losing events you should be increasing your event buffer memory as described in this section. Some events are too big to fail A less common reason for dropping an event is when an event is so large that it can’t fit into the event buffer. Even though most events are going to be small, you might find a condition that occasionally generates a very large event. You can determine if your session is dropping large events by looking at the dm_xe_sessions DMV once again, this time check the largest_event_dropped_size. If this value is larger than the size of your event buffer [remember, the size of your event buffer, by default, is max_memory / 3] then you need a large event buffer. To specify a large event buffer you set the MAX_EVENT_SIZE option to a value large enough to fit the largest event dropped based on data from the DMV. When you set this option the Extended Events engine will create two buffers of this size to accommodate these large events. As an added bonus (no extra charge) the large event buffer will also be used to store normal events in the cases where the normal event buffers are all full and waiting to be processed. (Note: This is just a side-effect, not the intended use. If you’re dropping many normal events then you should increase your normal event buffer size.) Partitioning: moving your events to a sub-division Earlier I alluded to the fact that you can configure your event session to use more than the standard three event buffers – this is called partitioning and is controlled by the MEMORY_PARTITION_MODE option. The result of setting this option is fairly easy to explain, but knowing when to use it is a bit more art than science. First the science… You can configure partitioning in three ways: None, Per NUMA Node & Per CPU. This specifies the location where sets of event buffers are created with fairly obvious implication. There are rules we follow for sub-dividing the total memory (specified by MAX_MEMORY) between all the event buffers that are specific to the mode used: None: 3 buffers (fixed)Node: 3 * number_of_nodesCPU: 2.5 * number_of_cpus Here are some examples of what this means for different Node/CPU counts: Configuration None Node CPU 2 CPUs, 1 Node 3 buffers 3 buffers 5 buffers 6 CPUs, 2 Node 3 buffers 6 buffers 15 buffers 40 CPUs, 5 Nodes 3 buffers 15 buffers 100 buffers Aside: Buffer size on multi-processor computersAs the number of Nodes or CPUs increases, the size of the event buffer gets smaller because the total memory is sub-divided into more pieces. The defaults will hold up to this for a while since each buffer set is holding events only from the Node or CPU that it is associated with, but at some point the buffers will get too small and you’ll either see events being dropped or you’ll get an error when you create your session because you’re below the minimum buffer size. Increase the MAX_MEMORY setting to an appropriate number for the configuration. The most likely reason to start partitioning is going to be related to performance. If you notice that running an event session is impacting the performance of your server beyond a reasonably expected level [Yes, there is a reasonably expected level of work required to collect events.] then partitioning might be an answer. Before you partition you might want to check a few other things: Is your event retention set to NO_EVENT_LOSS and causing blocking? (I told you not to do this.) Consider changing your event loss mode or increasing memory. Are you over collecting and causing more work than necessary? Consider adding predicates to events or removing unnecessary events and actions from your session. Are you writing the file target to the same slow disk that you use for TempDB and your other high activity databases? <kidding> <not really> It’s always worth considering the end to end picture – if you’re writing events to a file you can be impacted by I/O, network; all the usual stuff. Assuming you’ve ruled out the obvious (and not so obvious) issues, there are performance conditions that will be addressed by partitioning. For example, it’s possible to have a successful event session (eg. no dropped events) but still see a performance impact because you have many CPUs all attempting to write to the same free buffer and having to wait in line to finish their work. This is a case where partitioning would relieve the contention between the different CPUs and likely reduce the performance impact cause by the event session. There is no DMV you can check to find these conditions – sorry – that’s where the art comes in. This is largely a matter of experimentation. On the bright side you probably won’t need to to worry about this level of detail all that often. The performance impact of Extended Events is significantly lower than what you may be used to with SQL Trace. You will likely only care about the impact if you are trying to set up a long running event session that will be part of your everyday workload – sessions used for short term troubleshooting will likely fall into the “reasonably expected impact” category. Hey buddy – I think you forgot something OK, there are two options I didn’t cover: STARTUP_STATE & TRACK_CAUSALITY. If you want your event sessions to start automatically when the server starts, set the STARTUP_STATE option to ON. (Now there is only one option I didn’t cover.) I’m going to leave causality for another post since it’s not really related to session behavior, it’s more about event analysis. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Option Trading: Getting the most out of the event session options

- by extended_events

You can control different aspects of how an event session behaves by setting the event session options as part of the CREATE EVENT SESSION DDL. The default settings for the event session options are designed to handle most of the common event collection situations so I generally recommend that you just use the defaults. Like everything in the real world though, there are going to be a handful of “special cases” that require something different. This post focuses on identifying the special cases and the correct use of the options to accommodate those cases. There is a reason it’s called Default The default session options specify a total event buffer size of 4 MB with a 30 second latency. Translating this into human terms; this means that our default behavior is that the system will start processing events from the event buffer when we reach about 1.3 MB of events or after 30 seconds, which ever comes first. Aside: What’s up with the 1.3 MB, I thought you said the buffer was 4 MB?The Extended Events engine takes the total buffer size specified by MAX_MEMORY (4MB by default) and divides it into 3 equally sized buffers. This is done so that a session can be publishing events to one buffer while other buffers are being processed. There are always at least three buffers; how to get more than three is covered later. Using this configuration, the Extended Events engine can “keep up” with most event sessions on standard workloads. Why is this? The fact is that most events are small, really small; on the order of a couple hundred bytes. Even when you start considering events that carry dynamically sized data (eg. binary, text, etc.) or adding actions that collect additional data, the total size of the event is still likely to be pretty small. This means that each buffer can likely hold thousands of events before it has to be processed. When the event buffers are finally processed there is an economy of scale achieved since most targets support bulk processing of the events so they are processed at the buffer level rather than the individual event level. When all this is working together it’s more likely that a full buffer will be processed and put back into the ready queue before the remaining buffers (remember, there are at least three) are full. I know what you’re going to say: “My server is exceptional! My workload is so massive it defies categorization!” OK, maybe you weren’t going to say that exactly, but you were probably thinking it. The point is that there are situations that won’t be covered by the Default, but that’s a good place to start and this post assumes you’ve started there so that you have something to look at in order to determine if you do have a special case that needs different settings. So let’s get to the special cases… What event just fired?! How about now?! Now?! If you believe the commercial adage from Heinz Ketchup (Heinz Slow Good Ketchup ad on You Tube), some things are worth the wait. This is not a belief held by most DBAs, particularly DBAs who are looking for an answer to a troubleshooting question fast. If you’re one of these anxious DBAs, or maybe just a Program Manager doing a demo, then 30 seconds might be longer than you’re comfortable waiting. If you find yourself in this situation then consider changing the MAX_DISPATCH_LATENCY option for your event session. This option will force the event buffers to be processed based on your time schedule. This option only makes sense for the asynchronous targets since those are the ones where we allow events to build up in the event buffer – if you’re using one of the synchronous targets this option isn’t relevant. Avoid forgotten events by increasing your memory Have you ever had one of those days where you keep forgetting things? That can happen in Extended Events too; we call it dropped events. In order to optimizes for server performance and help ensure that the Extended Events doesn’t block the server if to drop events that can’t be published to a buffer because the buffer is full. You can determine if events are being dropped from a session by querying the dm_xe_sessions DMV and looking at the dropped_event_count field. Aside: Should you care if you’re dropping events?Maybe not – think about why you’re collecting data in the first place and whether you’re really going to miss a few dropped events. For example, if you’re collecting query duration stats over thousands of executions of a query it won’t make a huge difference to miss a couple executions. Use your best judgment. If you find that your session is dropping events it means that the event buffer is not large enough to handle the volume of events that are being published. There are two ways to address this problem. First, you could collect fewer events – examine you session to see if you are over collecting. Do you need all the actions you’ve specified? Could you apply a predicate to be more specific about when you fire the event? Assuming the session is defined correctly, the next option is to change the MAX_MEMORY option to a larger number. Picking the right event buffer size might take some trial and error, but a good place to start is with the number of dropped events compared to the number you’ve collected. Aside: There are three different behaviors for dropping events that you specify using the EVENT_RETENTION_MODE option. The default is to allow single event loss and you should stick with this setting since it is the best choice for keeping the impact on server performance low.You’ll be tempted to use the setting to not lose any events (NO_EVENT_LOSS) – resist this urge since it can result in blocking on the server. If you’re worried that you’re losing events you should be increasing your event buffer memory as described in this section. Some events are too big to fail A less common reason for dropping an event is when an event is so large that it can’t fit into the event buffer. Even though most events are going to be small, you might find a condition that occasionally generates a very large event. You can determine if your session is dropping large events by looking at the dm_xe_sessions DMV once again, this time check the largest_event_dropped_size. If this value is larger than the size of your event buffer [remember, the size of your event buffer, by default, is max_memory / 3] then you need a large event buffer. To specify a large event buffer you set the MAX_EVENT_SIZE option to a value large enough to fit the largest event dropped based on data from the DMV. When you set this option the Extended Events engine will create two buffers of this size to accommodate these large events. As an added bonus (no extra charge) the large event buffer will also be used to store normal events in the cases where the normal event buffers are all full and waiting to be processed. (Note: This is just a side-effect, not the intended use. If you’re dropping many normal events then you should increase your normal event buffer size.) Partitioning: moving your events to a sub-division Earlier I alluded to the fact that you can configure your event session to use more than the standard three event buffers – this is called partitioning and is controlled by the MEMORY_PARTITION_MODE option. The result of setting this option is fairly easy to explain, but knowing when to use it is a bit more art than science. First the science… You can configure partitioning in three ways: None, Per NUMA Node & Per CPU. This specifies the location where sets of event buffers are created with fairly obvious implication. There are rules we follow for sub-dividing the total memory (specified by MAX_MEMORY) between all the event buffers that are specific to the mode used: None: 3 buffers (fixed)Node: 3 * number_of_nodesCPU: 2.5 * number_of_cpus Here are some examples of what this means for different Node/CPU counts: Configuration None Node CPU 2 CPUs, 1 Node 3 buffers 3 buffers 5 buffers 6 CPUs, 2 Node 3 buffers 6 buffers 15 buffers 40 CPUs, 5 Nodes 3 buffers 15 buffers 100 buffers Aside: Buffer size on multi-processor computersAs the number of Nodes or CPUs increases, the size of the event buffer gets smaller because the total memory is sub-divided into more pieces. The defaults will hold up to this for a while since each buffer set is holding events only from the Node or CPU that it is associated with, but at some point the buffers will get too small and you’ll either see events being dropped or you’ll get an error when you create your session because you’re below the minimum buffer size. Increase the MAX_MEMORY setting to an appropriate number for the configuration. The most likely reason to start partitioning is going to be related to performance. If you notice that running an event session is impacting the performance of your server beyond a reasonably expected level [Yes, there is a reasonably expected level of work required to collect events.] then partitioning might be an answer. Before you partition you might want to check a few other things: Is your event retention set to NO_EVENT_LOSS and causing blocking? (I told you not to do this.) Consider changing your event loss mode or increasing memory. Are you over collecting and causing more work than necessary? Consider adding predicates to events or removing unnecessary events and actions from your session. Are you writing the file target to the same slow disk that you use for TempDB and your other high activity databases? <kidding> <not really> It’s always worth considering the end to end picture – if you’re writing events to a file you can be impacted by I/O, network; all the usual stuff. Assuming you’ve ruled out the obvious (and not so obvious) issues, there are performance conditions that will be addressed by partitioning. For example, it’s possible to have a successful event session (eg. no dropped events) but still see a performance impact because you have many CPUs all attempting to write to the same free buffer and having to wait in line to finish their work. This is a case where partitioning would relieve the contention between the different CPUs and likely reduce the performance impact cause by the event session. There is no DMV you can check to find these conditions – sorry – that’s where the art comes in. This is largely a matter of experimentation. On the bright side you probably won’t need to to worry about this level of detail all that often. The performance impact of Extended Events is significantly lower than what you may be used to with SQL Trace. You will likely only care about the impact if you are trying to set up a long running event session that will be part of your everyday workload – sessions used for short term troubleshooting will likely fall into the “reasonably expected impact” category. Hey buddy – I think you forgot something OK, there are two options I didn’t cover: STARTUP_STATE & TRACK_CAUSALITY. If you want your event sessions to start automatically when the server starts, set the STARTUP_STATE option to ON. (Now there is only one option I didn’t cover.) I’m going to leave causality for another post since it’s not really related to session behavior, it’s more about event analysis. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Hello Operator, My Switch Is Bored

- by Paul White

This is a post for T-SQL Tuesday #43 hosted by my good friend Rob Farley. The topic this month is Plan Operators. I haven’t taken part in T-SQL Tuesday before, but I do like to write about execution plans, so this seemed like a good time to start. This post is in two parts. The first part is primarily an excuse to use a pretty bad play on words in the title of this blog post (if you’re too young to know what a telephone operator or a switchboard is, I hate you). The second part of the post looks at an invisible query plan operator (so to speak). 1. My Switch Is Bored Allow me to present the rare and interesting execution plan operator, Switch: Books Online has this to say about Switch: Following that description, I had a go at producing a Fast Forward Cursor plan that used the TOP operator, but had no luck. That may be due to my lack of skill with cursors, I’m not too sure. The only application of Switch in SQL Server 2012 that I am familiar with requires a local partitioned view: CREATE TABLE dbo.T1 (c1 int NOT NULL CHECK (c1 BETWEEN 00 AND 24)); CREATE TABLE dbo.T2 (c1 int NOT NULL CHECK (c1 BETWEEN 25 AND 49)); CREATE TABLE dbo.T3 (c1 int NOT NULL CHECK (c1 BETWEEN 50 AND 74)); CREATE TABLE dbo.T4 (c1 int NOT NULL CHECK (c1 BETWEEN 75 AND 99)); GO CREATE VIEW V1 AS SELECT c1 FROM dbo.T1 UNION ALL SELECT c1 FROM dbo.T2 UNION ALL SELECT c1 FROM dbo.T3 UNION ALL SELECT c1 FROM dbo.T4; Not only that, but it needs an updatable local partitioned view. We’ll need some primary keys to meet that requirement: ALTER TABLE dbo.T1 ADD CONSTRAINT PK_T1 PRIMARY KEY (c1); ALTER TABLE dbo.T2 ADD CONSTRAINT PK_T2 PRIMARY KEY (c1); ALTER TABLE dbo.T3 ADD CONSTRAINT PK_T3 PRIMARY KEY (c1); ALTER TABLE dbo.T4 ADD CONSTRAINT PK_T4 PRIMARY KEY (c1); We also need an INSERT statement that references the view. Even more specifically, to see a Switch operator, we need to perform a single-row insert (multi-row inserts use a different plan shape): INSERT dbo.V1 (c1) VALUES (1); And now…the execution plan: The Constant Scan manufactures a single row with no columns. The Compute Scalar works out which partition of the view the new value should go in. The Assert checks that the computed partition number is not null (if it is, an error is returned). The Nested Loops Join executes exactly once, with the partition id as an outer reference (correlated parameter). The Switch operator checks the value of the parameter and executes the corresponding input only. If the partition id is 0, the uppermost Clustered Index Insert is executed, adding a row to table T1. If the partition id is 1, the next lower Clustered Index Insert is executed, adding a row to table T2…and so on. In case you were wondering, here’s a query and execution plan for a multi-row insert to the view: INSERT dbo.V1 (c1) VALUES (1), (2); Yuck! An Eager Table Spool and four Filters! I prefer the Switch plan. My guess is that almost all the old strategies that used a Switch operator have been replaced over time, using things like a regular Concatenation Union All combined with Start-Up Filters on its inputs. Other new (relative to the Switch operator) features like table partitioning have specific execution plan support that doesn’t need the Switch operator either. This feels like a bit of a shame, but perhaps it is just nostalgia on my part, it’s hard to know. Please do let me know if you encounter a query that can still use the Switch operator in 2012 – it must be very bored if this is the only possible modern usage! 2. Invisible Plan Operators The second part of this post uses an example based on a question Dave Ballantyne asked using the SQL Sentry Plan Explorer plan upload facility. If you haven’t tried that yet, make sure you’re on the latest version of the (free) Plan Explorer software, and then click the Post to SQLPerformance.com button. That will create a site question with the query plan attached (which can be anonymized if the plan contains sensitive information). Aaron Bertrand and I keep a close eye on questions there, so if you have ever wanted to ask a query plan question of either of us, that’s a good way to do it. The problem The issue I want to talk about revolves around a query issued against a calendar table. The script below creates a simplified version and adds 100 years of per-day information to it: USE tempdb; GO CREATE TABLE dbo.Calendar ( dt date NOT NULL, isWeekday bit NOT NULL, theYear smallint NOT NULL, CONSTRAINT PK__dbo_Calendar_dt PRIMARY KEY CLUSTERED (dt) ); GO -- Monday is the first day of the week for me SET DATEFIRST 1; -- Add 100 years of data INSERT dbo.Calendar WITH (TABLOCKX) (dt, isWeekday, theYear) SELECT CA.dt, isWeekday = CASE WHEN DATEPART(WEEKDAY, CA.dt) IN (6, 7) THEN 0 ELSE 1 END, theYear = YEAR(CA.dt) FROM Sandpit.dbo.Numbers AS N CROSS APPLY ( VALUES (DATEADD(DAY, N.n - 1, CONVERT(date, '01 Jan 2000', 113))) ) AS CA (dt) WHERE N.n BETWEEN 1 AND 36525; The following query counts the number of weekend days in 2013: SELECT Days = COUNT_BIG(*) FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; It returns the correct result (104) using the following execution plan: The query optimizer has managed to estimate the number of rows returned from the table exactly, based purely on the default statistics created separately on the two columns referenced in the query’s WHERE clause. (Well, almost exactly, the unrounded estimate is 104.289 rows.) There is already an invisible operator in this query plan – a Filter operator used to apply the WHERE clause predicates. We can see it by re-running the query with the enormously useful (but undocumented) trace flag 9130 enabled: Now we can see the full picture. The whole table is scanned, returning all 36,525 rows, before the Filter narrows that down to just the 104 we want. Without the trace flag, the Filter is incorporated in the Clustered Index Scan as a residual predicate. It is a little bit more efficient than using a separate operator, but residual predicates are still something you will want to avoid where possible. The estimates are still spot on though: Anyway, looking to improve the performance of this query, Dave added the following filtered index to the Calendar table: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear) WHERE isWeekday = 0; The original query now produces a much more efficient plan: Unfortunately, the estimated number of rows produced by the seek is now wrong (365 instead of 104): What’s going on? The estimate was spot on before we added the index! Explanation You might want to grab a coffee for this bit. Using another trace flag or two (8606 and 8612) we can see that the cardinality estimates were exactly right initially: The highlighted information shows the initial cardinality estimates for the base table (36,525 rows), the result of applying the two relational selects in our WHERE clause (104 rows), and after performing the COUNT_BIG(*) group by aggregate (1 row). All of these are correct, but that was before cost-based optimization got involved :) Cost-based optimization When cost-based optimization starts up, the logical tree above is copied into a structure (the ‘memo’) that has one group per logical operation (roughly speaking). The logical read of the base table (LogOp_Get) ends up in group 7; the two predicates (LogOp_Select) end up in group 8 (with the details of the selections in subgroups 0-6). These two groups still have the correct cardinalities as trace flag 8608 output (initial memo contents) shows: During cost-based optimization, a rule called SelToIdxStrategy runs on group 8. It’s job is to match logical selections to indexable expressions (SARGs). It successfully matches the selections (theYear = 2013, is Weekday = 0) to the filtered index, and writes a new alternative into the memo structure. The new alternative is entered into group 8 as option 1 (option 0 was the original LogOp_Select): The new alternative is to do nothing (PhyOp_NOP = no operation), but to instead follow the new logical instructions listed below the NOP. The LogOp_GetIdx (full read of an index) goes into group 21, and the LogOp_SelectIdx (selection on an index) is placed in group 22, operating on the result of group 21. The definition of the comparison ‘the Year = 2013’ (ScaOp_Comp downwards) was already present in the memo starting at group 2, so no new memo groups are created for that. New Cardinality Estimates The new memo groups require two new cardinality estimates to be derived. First, LogOp_Idx (full read of the index) gets a predicted cardinality of 10,436. This number comes from the filtered index statistics: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH STAT_HEADER; The second new cardinality derivation is for the LogOp_SelectIdx applying the predicate (theYear = 2013). To get a number for this, the cardinality estimator uses statistics for the column ‘theYear’, producing an estimate of 365 rows (there are 365 days in 2013!): DBCC SHOW_STATISTICS (Calendar, theYear) WITH HISTOGRAM; This is where the mistake happens. Cardinality estimation should have used the filtered index statistics here, to get an estimate of 104 rows: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH HISTOGRAM; Unfortunately, the logic has lost sight of the link between the read of the filtered index (LogOp_GetIdx) in group 22, and the selection on that index (LogOp_SelectIdx) that it is deriving a cardinality estimate for, in group 21. The correct cardinality estimate (104 rows) is still present in the memo, attached to group 8, but that group now has a PhyOp_NOP implementation. Skipping over the rest of cost-based optimization (in a belated attempt at brevity) we can see the optimizer’s final output using trace flag 8607: This output shows the (incorrect, but understandable) 365 row estimate for the index range operation, and the correct 104 estimate still attached to its PhyOp_NOP. This tree still has to go through a few post-optimizer rewrites and ‘copy out’ from the memo structure into a tree suitable for the execution engine. One step in this process removes PhyOp_NOP, discarding its 104-row cardinality estimate as it does so. To finish this section on a more positive note, consider what happens if we add an OVER clause to the query aggregate. This isn’t intended to be a ‘fix’ of any sort, I just want to show you that the 104 estimate can survive and be used if later cardinality estimation needs it: SELECT Days = COUNT_BIG(*) OVER () FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; The estimated execution plan is: Note the 365 estimate at the Index Seek, but the 104 lives again at the Segment! We can imagine the lost predicate ‘isWeekday = 0’ as sitting between the seek and the segment in an invisible Filter operator that drops the estimate from 365 to 104. Even though the NOP group is removed after optimization (so we don’t see it in the execution plan) bear in mind that all cost-based choices were made with the 104-row memo group present, so although things look a bit odd, it shouldn’t affect the optimizer’s plan selection. I should also mention that we can work around the estimation issue by including the index’s filtering columns in the index key: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear, isWeekday) WHERE isWeekday = 0 WITH (DROP_EXISTING = ON); There are some downsides to doing this, including that changes to the isWeekday column may now require Halloween Protection, but that is unlikely to be a big problem for a static calendar table ;) With the updated index in place, the original query produces an execution plan with the correct cardinality estimation showing at the Index Seek: That’s all for today, remember to let me know about any Switch plans you come across on a modern instance of SQL Server! Finally, here are some other posts of mine that cover other plan operators: Segment and Sequence Project Common Subexpression Spools Why Plan Operators Run Backwards Row Goals and the Top Operator Hash Match Flow Distinct Top N Sort Index Spools and Page Splits Singleton and Range Seeks Bitmaps Hash Join Performance Compute Scalar © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
SQL SERVER – Weekly Series – Memory Lane – #038

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 CASE Statement in ORDER BY Clause – ORDER BY using Variable This article is as per request from the Application Development Team Leader of my company. His team encountered code where the application was preparing string for ORDER BY clause of the SELECT statement. Application was passing this string as variable to Stored Procedure (SP) and SP was using EXEC to execute the SQL string. This is not good for performance as Stored Procedure has to recompile every time due to EXEC. sp_executesql can do the same task but still not the best performance. SSMS – View/Send Query Results to Text/Grid/Files Results to Text – CTRL + T Results to Grid – CTRL + D Results to File – CTRL + SHIFT + F 2008 Introduction to SPARSE Columns Part 2 I wrote about Introduction to SPARSE Columns Part 1. Let us understand the concept of the SPARSE column in more detail. I suggest you read the first part before continuing reading this article. All SPARSE columns are stored as one XML column in the database. Let us see some of the advantage and disadvantage of SPARSE column. Deferred Name Resolution How come when table name is incorrect SP can be created successfully but when an incorrect column is used SP cannot be created? 2009 Backup Timeline and Understanding of Database Restore Process in Full Recovery Model In general, databases backup in full recovery mode is taken in three different kinds of database files. Full Database Backup Differential Database Backup Log Backup Restore Sequence and Understanding NORECOVERY and RECOVERY While doing RESTORE Operation if you restoring database files, always use NORECOVER option as that will keep the database in a state where more backup file are restored. This will also keep database offline also to prevent any changes, which can create itegrity issues. Once all backup file is restored run RESTORE command with a RECOVERY option to get database online and operational. Four Different Ways to Find Recovery Model for Database Perhaps, the best thing about technical domain is that most of the things can be executed in more than one ways. It is always useful to know about the various methods of performing a single task. Two Methods to Retrieve List of Primary Keys and Foreign Keys of Database When Information Schema is used, we will not be able to discern between primary key and foreign key; we will have both the keys together. In the case of sys schema, we can query the data in our preferred way and can join this table to another table, which can retrieve additional data from the same. Get Last Running Query Based on SPID PID is returns sessions ID of the current user process. The acronym SPID comes from the name of its earlier version, Server Process ID. 2010 SELECT * FROM dual – Dual Equivalent Dual is a table that is created by Oracle together with data dictionary. It consists of exactly one column named “dummy”, and one record. The value of that record is X. You can check the content of the DUAL table using the following syntax. SELECT * FROM dual Identifying Statistics Used by Query Someone asked this question in my training class of query optimization and performance tuning. “Can I know which statistics were used by my query?” 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 14 of 31 What are the basic functions for master, msdb, model, tempdb and resource databases? What is the Maximum Number of Index per Table? Explain Few of the New Features of SQL Server 2008 Management Studio Explain IntelliSense for Query Editing Explain MultiServer Query Explain Query Editor Regions Explain Object Explorer Enhancements Explain Activity Monitors SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 15 of 31 What is Service Broker? Where are SQL server Usernames and Passwords Stored in the SQL server? What is Policy Management? What is Database Mirroring? What are Sparse Columns? What does TOP Operator Do? What is CTE? What is MERGE Statement? What is Filtered Index? Which are the New Data Types Introduced in SQL SERVER 2008? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 16 of 31 What are the Advantages of Using CTE? How can we Rewrite Sub-Queries into Simple Select Statements or with Joins? What is CLR? What are Synonyms? What is LINQ? What are Isolation Levels? What is Use of EXCEPT Clause? What is XPath? What is NOLOCK? What is the Difference between Update Lock and Exclusive Lock? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 17 of 31 How will you Handle Error in SQL SERVER 2008? What is RAISEERROR? What is RAISEERROR? How to Rebuild the Master Database? What is the XML Datatype? What is Data Compression? What is Use of DBCC Commands? How to Copy the Tables, Schema and Views from one SQL Server to Another? How to Find Tables without Indexes? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 18 of 31 How to Copy Data from One Table to Another Table? What is Catalog Views? What is PIVOT and UNPIVOT? What is a Filestream? What is SQLCMD? What do you mean by TABLESAMPLE? What is ROW_NUMBER()? What are Ranking Functions? What is Change Data Capture (CDC) in SQL Server 2008? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 19 of 31 How can I Track the Changes or Identify the Latest Insert-Update-Delete from a Table? What is the CPU Pressure? How can I Get Data from a Database on Another Server? What is the Bookmark Lookup and RID Lookup? What is Difference between ROLLBACK IMMEDIATE and WITH NO_WAIT during ALTER DATABASE? What is Difference between GETDATE and SYSDATETIME in SQL Server 2008? How can I Check that whether Automatic Statistic Update is Enabled or not? How to Find Index Size for Each Index on Table? What is the Difference between Seek Predicate and Predicate? What are Basics of Policy Management? What are the Advantages of Policy Management? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 20 of 31 What are Policy Management Terms? What is the ‘FILLFACTOR’? Where in MS SQL Server is ’100’ equal to ‘0’? What are Points to Remember while Using the FILLFACTOR Argument? What is a ROLLUP Clause? What are Various Limitations of the Views? What is a Covered index? When I Delete any Data from a Table, does the SQL Server reduce the size of that table? What are Wait Types? How to Stop Log File Growing too Big? If any Stored Procedure is Encrypted, then can we see its definition in Activity Monitor? 2012 Example of Width Sensitive and Width Insensitive Collation Width Sensitive Collation: A single-byte character (half-width) represented as single-byte and the same character represented as a double-byte character (full-width) are when compared are not equal the collation is width sensitive. In this example we have one table with two columns. One column has a collation of width sensitive and the second column has a collation of width insensitive. Find Column Used in Stored Procedure – Search Stored Procedure for Column Name Very interesting conversation about how to find column used in a stored procedure. There are two different characters in the story and both are having a conversation about how to find column in the stored procedure. Here are two part story Part 1 | Part 2 SQL SERVER – 2012 Functions – FORMAT() and CONCAT() – An Interesting Usage Generate Script for Schema and Data – SQL in Sixty Seconds #021 – Video In simple words, in many cases the database move from one place to another place. It is not always possible to back up and restore databases. There are possibilities when only part of the database (with schema and data) has to be moved. In this video we learn that we can easily generate script for schema for data and move from one server to another one. INFORMATION_SCHEMA.COLUMNS and Value Character Maximum Length -1 I often see the value -1 in the CHARACTER_MAXIMUM_LENGTH column of INFORMATION_SCHEMA.COLUMNS table. I understand that the length of any column can be between 0 to large number but I do not get it when I see value in negative (i.e. -1). Any insight on this subject? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Spooling in SQL execution plans

- by Rob Farley

Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

Read the article
Table Variables: an empirical approach.

- by Phil Factor

It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter as ‘Horrible’, and to have it criticized on the MVP forum. When this happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these critics were experts whose views I respect. What was my crime? It was, I think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number of rows. What are these quirks? Well, table variables have advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make assumptions about the data rather than using column distribution statistics when a table variable is involved in a join, because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved in queries. However complex the heuristics that are used might be in determining the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful. The experienced developer or DBA will be on the lookout for this sort of problem. In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate the quirks, and include a subsequent example supplied by Kevin Boles. A simplified example. We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget indexes altogether for this example, and use heaps. We’ll try the same Join with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables. It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary table up to 10,000 rows. Actually, even without a use of the OPTION (RECOMPILE) hint, it is doing well. What happens when your table size increases? The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows, at least. It is performing better than a Temporary table, and in a good linear fashion. What about mixed table joins, where you are joining a temporary table to a table variable? You’d probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows since we know the bigger picture now) Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re dealing with a very complex beast: the Query Optimizer. It can come up with surprises What if we change the query very slightly to insert the results into a Table Variable? We change nothing else and just measure the execution time of the statement as before. Suddenly, the table variable isn’t looking so much better, even taking into account the time involved in doing the table insert. OK, if you haven’t used OPTION (RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the Table variable and the temporary table. The table variable is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the 8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t use constraints, you’re going to need that Option (RECOMPILE) hint. Count Dracula and the Horror Join. These tables of integers provide a rather unreal example, so let’s try a rather different example, and get stuck into some implicit indexing, by using constraints. What unusual words are contained in the book ‘Dracula’ by Bram Stoker? Here we get a table of all the common words in the English language (60,387 of them) and put them in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the right-hand value is null. The results show a huge variation, between the sublime and the gorblimey. If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms. If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms. If both Table Variables use a unique constraint, then the query takes 36 Ms. If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!! If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table, it took 113 Ms. If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.If both tables are temporary tables and both have primary keys, it took 46 Ms. Here we see table variables which are joined on their primary key again enjoying a slight performance advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans.So where have we got to? Without drilling down into the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your tables are relatively small, they are faster than temporary tables, but as the number of rows increases you need to do one of two things: either you need to have a primary key on the column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably according to context). Kevin’s Skew In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very skewed table. In the original, pasted into my article as a comment, a column consisted of 100000 rows in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan. This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the temporary table and then all was well. Undeniably, real data does occasionally cause problems to the performance of joins in Table Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real live data. As in Kevin’s example, indexes merely make matters worse, and the OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table. However, one has to note that once the slight de-skew had taken place, then the plans were identical across a huge range. Conclusions Where you need to hold intermediate results as part of a process, Table Variables offer a good alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and when the nature of the processing makes an index seek very effective. Table Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware, however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor execution plans, and this must be cured either by adding constraints (UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is impossible. Occasionally, the way that the data is distributed prevents the efficient use of Table Variables, and this will require using a temporary table instead. Tables Variables require some awareness by the developer about the potential hazards and how to avoid them. If you are not prepared to do any performance monitoring of your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables without considering the niceties of processing just what is required and no more, then temporary tables provide a safer and less fragile means-to-an-end for you.

Read the article
Beware Sneaky Reads with Unique Indexes

- by Paul White NZ

A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest ( pk INTEGER IDENTITY NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( some_value, lob_data ) SELECT TOP (1000) N.n, @Data FROM Numbers N WHERE N.n <= 1000; Test 1: A Simple Update Let’s run a query to subtract one from every value in the some_value column: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources. The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time. The query plan is also very simple: Looking at the Clustered Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan: The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed. The some_value column is used by the Compute Scalar to calculate the new value. (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT). Test 2: Simple Update with an Index Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column: CREATE NONCLUSTERED INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); This is not a useful index for our simple update query; imagine that someone else created it for a different purpose. Let’s run our update query again: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms. The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index. The query plan is very similar to before (click to enlarge): The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index. The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update. SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post on non-updating updates for details). Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case. The output list of columns from the Clustered Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns. Cool. Overall then, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table. Let’s see what happens if we make the nonclustered index unique. Test 3: Simple Update with a Unique Index Here’s the script to create a new unique index, and drop the old one: CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); GO DROP INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest; Remember that SQL Server only enforces uniqueness on index keys (the some_data column). The lob_data column is simply stored at the leaf-level of the non-clustered index. With that in mind, we might expect this change to make very little difference. Let’s see: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; Whoa! Now look at the elapsed time and logical reads: Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. CPU time = 172 ms, elapsed time = 16172 ms. Even with all the data and index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead). Why on earth is SQL Server reading LOB data in a query that only updates a single integer column? The Query Plan The query plan for test 3 looks a bit more complex than before: In fact, the bottom level is exactly the same as we saw with the non-unique index. The top level has heaps of new stuff though, which I’ll come to in a moment. You might be expecting to find that the Clustered Index Scan is now reading the lob_data column (for some reason). After all, we need to explain where all the LOB logical reads are coming from. Sadly, when we look at the properties of the Clustered Index Scan, we see exactly the same as before: SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads? Updates that Sneakily Read Data We have to go as far as the Clustered Index Update operator before we see LOB data in the output list: [Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier). The Clustered Index Update operator adds two new columns: the lob_data column, and some_value_OLD. The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column. At this point, the clustered index has already been updated with the new value, but we haven’t touched the nonclustered index yet. An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its update operation. SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered Index Update. The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then. Sneaky! Why the LOB Data Is Needed This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update? The answer relates to the top row of the query plan for test 3. I’ll reproduce it here for convenience: Notice that this is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index too. I’ll talk more about this difference shortly. The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered index. Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4. The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4. By applying net changes, SQL Server can also avoid false unique-key violations. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail. You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down. That’s fine, but it’s not possible to generalize this logic to work with every possible update query. SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible. Another benefit of this optimization is that it includes a sort on the index key as part of its work. Processing the index changes in index key order promotes sequential I/O against the nonclustered index. A side-effect of all this is that the net changes might include one or more inserts. In order to insert a new row in the index, SQL Server obviously needs all the columns – the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered Index Update. In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs). In order to generate the correct index key delete operation, it needs the old key value. The irony is that in this case the Split/Sort/Collapse optimization is anything but. Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it. Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates. Beating the Set-Based Update with a Cursor One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated. Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data: SET NOCOUNT ON; SET STATISTICS XML, IO, TIME OFF; DECLARE @PK INTEGER, @StartTime DATETIME; SET @StartTime = GETUTCDATE(); DECLARE curUpdate CURSOR LOCAL FORWARD_ONLY KEYSET SCROLL_LOCKS FOR SELECT L.pk FROM LOBtest L ORDER BY L.pk ASC; OPEN curUpdate; WHILE (1 = 1) BEGIN FETCH NEXT FROM curUpdate INTO @PK; IF @@FETCH_STATUS = -1 BREAK; IF @@FETCH_STATUS = -2 CONTINUE; UPDATE dbo.LOBtest SET some_value = some_value - 1 WHERE CURRENT OF curUpdate; END; CLOSE curUpdate; DEALLOCATE curUpdate; SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE()); That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!) I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it. One could just as well use a WHERE clause that specified the primary key value instead. Clustered Indexes A clustered index is the ultimate index with included columns: all non-key columns are included columns in a clustered index. Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes: IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL DROP TABLE dbo.LOBtest; GO CREATE TABLE dbo.LOBtest ( pk INTEGER NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( pk, some_value, lob_data ) SELECT TOP (1000) N.n, N.n, @Data FROM Numbers N WHERE N.n <= 1000; Now here’s a query to modify the cluster keys: UPDATE dbo.LOBtest SET pk = pk + 1; The query plan is: As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection. In addition, SQL Server now has no choice but to read the LOB data in the Clustered Index Scan: The performance is not great, as you might expect (even though there is no non-clustered index to maintain): Table 'LOBtest'. Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. Table 'Worktable'. Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0, lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000. SQL Server Execution Times: CPU time = 483 ms, elapsed time = 17884 ms. Notice how the LOB data is read twice: once from the Clustered Index Scan, and again from the work table in tempdb used by the Eager Spool. If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns): A unique non-clustered index (on a heap) works well too: Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. (In fact the heap is rather more efficient). There are lots more fun combinations to try that I don’t have space for here. Final Thoughts The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article

Search Results

Search found 150 results on 6 pages for 'tempdb'.

Page 6/6 | < Previous Page | 2 3 4 5 6

- by cshah

- by John Stephen

- by John Stephen

- by Mark

- by John Stephen

- by Richard Slater

- by pinaldave

- by pinaldave

- by pinaldave

- by Fatherjack

- by Johnm

- by Rodney Landrum

- by Ben Challenor

- by Yousui

- by pinaldave

- by Pinal Dave

- by Maria Zakourdaev

- by Paul White NZ

- by extended_events

- by extended_events

- by Paul White

- by Pinal Dave

- by Rob Farley

- by Phil Factor

- by Paul White NZ

< Previous Page | 2 3 4 5 6