Archives for Mon Jun 24 2013 - Page 10

Attention users running SQL Server 2008 & 2008 R2!

- by AaronBertrand

In April and May, Microsoft released cumulative updates for SQL Server 2008 and 2008 R2 (I blogged about them here and here ). They are: CU #11 for 2008 SP3 (10.00.5840) ( KB #2834048 ) CU #12 for 2008 R2 SP1 (10.50.2874) ( KB #2828727 ) CU #6 for 2008 R2 SP2 (10.50.4279) ( KB #2830140 ) Sometime after that, looks like the next day, both downloads were pulled, allegedly due to an index corruption issue (if you believe the commentary on the Release Services blog post for CU #6 ) or due to an issue...(read more)

Read the article

DBCC CHECKDB on VVLDB and latches (Or: My Pain is Your Gain)

- by Argenis

Does your CHECKDB hurt, Argenis? There is a classic blog series by Paul Randal [blog|twitter] called “CHECKDB From Every Angle” which is pretty much mandatory reading for anybody who’s even remotely considering going for the MCM certification, or its replacement (the Microsoft Certified Solutions Master: Data Platform – makes my fingers hurt just from typing it). Of particular interest is the post “Consistency Options for a VLDB” – on it, Paul provides solid, timeless advice (I use the word “timeless” because it was written in 2007, and it all applies today!) on how to perform checks on very large databases. Well, here I was trying to figure out how to make CHECKDB run faster on a restored copy of one of our databases, which happens to exceed 7TB in size. The whole thing was taking several days on multiple systems, regardless of the storage used – SAS, SATA or even SSD…and I actually didn’t pay much attention to how long it was taking, or even bothered to look at the reasons why - as long as it was finishing okay and found no consistency errors. Yes – I know. That was a huge mistake, as corruption found in a database several days after taking place could only allow for further spread of the corruption – and potentially large data loss. In the last two weeks I increased my attention towards this problem, as we noticed that CHECKDB was taking EVEN LONGER on brand new all-flash storage in the SAN! I couldn’t really explain it, and were almost ready to blame the storage vendor. The vendor told us that they could initially see the server driving decent I/O – around 450Mb/sec, and then it would settle at a very slow rate of 10Mb/sec or so. “Hum”, I thought – “CHECKDB is just not pushing the I/O subsystem hard enough”. Perfmon confirmed the vendor’s observations. Dreaded @BlobEater What was CHECKDB doing all the time while doing so little I/O? Eating Blobs. It turns out that CHECKDB was taking an extremely long time on one of our frankentables, which happens to be have 35 billion rows (yup, with a b) and sucks up several terabytes of space in the database. We do have a project ongoing to purge/split/partition this table, so it’s just a matter of time before we deal with it. But the reality today is that CHECKDB is coming to a screeching halt in performance when dealing with this particular table. Checking sys.dm_os_waiting_tasks and sys.dm_os_latch_stats showed that LATCH_EX (DBCC_OBJECT_METADATA) was by far the top wait type. I remembered hearing recently about that wait from another post that Paul Randal made, but that was related to computed-column indexes, and in fact, Paul himself reminded me of his article via twitter. But alas, our pathologic table had no non-clustered indexes on computed columns. I knew that latches are used by the database engine to do internal synchronization – but how could I help speed this up? After all, this is stuff that doesn’t have a lot of knobs to tweak. (There’s a fantastic level 500 talk by Bob Ward from Microsoft CSS [blog|twitter] called “Inside SQL Server Latches” given at PASS 2010 – and you can check it out here. DISCLAIMER: I assume no responsibility for any brain melting that might ensue from watching Bob’s talk!) Failed Hypotheses Earlier on this week I flew down to Palo Alto, CA, to visit our Headquarters – and after having a great time with my Monkey peers, I was relaxing on the plane back to Seattle watching a great talk by SQL Server MVP and fellow MCM Maciej Pilecki [twitter] called “Masterclass: A Day in the Life of a Database Transaction” where he discusses many different topics related to transaction management inside SQL Server. Very good stuff, and when I got home it was a little late – that slow DBCC CHECKDB that I had been dealing with was way in the back of my head. As I was looking at the problem at hand earlier on this week, I thought “How about I set the database to read-only?” I remembered one of the things Maciej had (jokingly) said in his talk: “if you don’t want locking and blocking, set the database to read-only” (or something to that effect, pardon my loose memory). I immediately killed the CHECKDB which had been running painfully for days, and set the database to read-only mode. Then I ran DBCC CHECKDB against it. It started going really fast (even a bit faster than before), and then throttled down again to around 10Mb/sec. All sorts of expletives went through my head at the time. Sure enough, the same latching scenario was present. Oh well. I even spent some time trying to figure out if NUMA was hurting performance. Folks on Twitter made suggestions in this regard (thanks, Lonny! [twitter]) …Eureka? This past Friday I was still scratching my head about the whole thing; I was ready to start profiling with XPERF to see if I could figure out which part of the engine was to blame and then get Microsoft to look at the evidence. After getting a bunch of good news I’ll blog about separately, I sat down for a figurative smack down with CHECKDB before the weekend. And then the light bulb went on. A sparse column. I thought that I couldn’t possibly be experiencing the same scenario that Paul blogged about back in March showing extreme latching with non-clustered indexes on computed columns. Did I even have a non-clustered index on my sparse column? As it turns out, I did. I had one filtered non-clustered index – with the sparse column as the index key (and only column). To prove that this was the problem, I went and setup a test. Yup, that'll do it The repro is very simple for this issue: I tested it on the latest public builds of SQL Server 2008 R2 SP2 (CU6) and SQL Server 2012 SP1 (CU4). First, create a test database and a test table, which only needs to contain a sparse column: CREATE DATABASE SparseColTest; GO USE SparseColTest; GO CREATE TABLE testTable (testCol smalldatetime SPARSE NULL); GO INSERT INTO testTable (testCol) VALUES (NULL); GO 1000000 That’s 1 million rows, and even though you’re inserting NULLs, that’s going to take a while. In my laptop, it took 3 minutes and 31 seconds. Next, we run DBCC CHECKDB against the database: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; This runs extremely fast, as least on my test rig – 198 milliseconds. Now let’s create a filtered non-clustered index on the sparse column: CREATE NONCLUSTERED INDEX [badBadIndex] ON testTable (testCol) WHERE testCol IS NOT NULL; With the index in place now, let’s run DBCC CHECKDB one more time: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; In my test system this statement completed in 11433 milliseconds. 11.43 full seconds. Quite the jump from 198 milliseconds. I went ahead and dropped the filtered non-clustered indexes on the restored copy of our production database, and ran CHECKDB against that. We went down from 7+ days to 19 hours and 20 minutes. Cue the “Argenis is not impressed” meme, please, Mr. LaRock. My pain is your gain, folks. Go check to see if you have any of such indexes – they’re likely causing your consistency checks to run very, very slow. Happy CHECKDBing, -Argenis ps: I plan to file a Connect item for this issue – I consider it a pretty serious bug in the engine. After all, filtered indexes were invented BECAUSE of the sparse column feature – and it makes a lot of sense to use them together. Watch this space and my twitter timeline for a link.

Read the article

Load Test Manifesto

- by jchang

Load testing used to be a standard part of the software development, but not anymore. Now people express a preference for assessing performance on the production system. There is a lack of confidence that a load test reflects what will actually happen in production. In essence, it has become accepted that the value of load testing is not worth the cost and time, and perhaps whether there is any value at all. The main problem is the load test plan criteria – excessive focus on perceived importance...(read more)

Read the article

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article

Excel 2013 Data Explorer and GeoFlow make 3-D maps quick and easy

- by John Paul Cook

Excel add-ins Data Explorer and GeoFlow work well together, mainly because they just work. Simple, fast, and powerful. I started Excel 2013, used Data Explorer to search for, examine, and then download latitude-longitude data and finally used GeoFlow to plot an interactive 3-D visualization. I didn’t use any fancy Excel commands and the entire process took less than 3 minutes. You can download the GeoFlow preview from here . It can also be used with Office 365. Start by clicking the DATA EXPLORER...(read more)

Read the article

Book review: SQL Server Transaction Log Management

- by Hugo Kornelis

It was an offer I could not resist. I was promised a free copy of one of the newest books from Red Gate Books , SQL Server Transaction Log Management (by Tony Davis and Gail Shaw ), with the caveat that I should write a review after reading it. Mind you, not a commercial, “make sure we sell more copies” kind of review, but a review of my actual thoughts. Yes, I got explicit permission to be my usual brutally honest self. A total win/win for me! First, I get a free book – and free is always good,...(read more)

Read the article

Plan Operator Tuesday round-up

- by Rob Farley

Eighteen posts for T-SQL Tuesday #43 this month, discussing Plan Operators. I put them together and made the following clickable plan. It’s 1000px wide, so I hope you have a monitor wide enough. Let me explain this plan for you (people’s names are the links to the articles on their blogs – the same links as in the plan above). It was clearly a SELECT statement. Wayne Sheffield (@dbawayne) wrote about that, so we start with a SELECT physical operator, leveraging the logical operator Wayne Sheffield. The SELECT operator calls the Paul White operator, discussed by Jason Brimhall (@sqlrnnr) in his post. The Paul White operator is quite remarkable, and can consume three streams of data. Let’s look at those streams. The first pulls data from a Table Scan – Boris Hristov (@borishristov)’s post – using parallel threads (Bradley Ball – @sqlballs) that pull the data eagerly through a Table Spool (Oliver Asmus – @oliverasmus). A scalar operation is also performed on it, thanks to Jeffrey Verheul (@devjef)’s Compute Scalar operator. The second stream of data applies Evil (I figured that must mean a procedural TVF, but could’ve been anything), courtesy of Jason Strate (@stratesql). It performs this Evil on the merging of parallel streams (Steve Jones – @way0utwest), which suck data out of a Switch (Paul White – @sql_kiwi). This Switch operator is consuming data from up to four lookups, thanks to Kalen Delaney (@sqlqueen), Rick Krueger (@dataogre), Mickey Stuewe (@sqlmickey) and Kathi Kellenberger (@auntkathi). Unfortunately Kathi’s name is a bit long and has been truncated, just like in real plans. The last stream performs a join of two others via a Nested Loop (Matan Yungman – @matanyungman). One pulls data from a Spool (my post – @rob_farley) populated from a Table Scan (Jon Morisi). The other applies a catchall operator (the catchall is because Tamera Clark (@tameraclark) didn’t specify any particular operator, and a catchall is what gets shown when SSMS doesn’t know what to show. Surprisingly, it’s showing the yellow one, which is about cursors. Hopefully that’s not what Tamera planned, but anyway...) to the output from an Index Seek operator (Sebastian Meine – @sqlity). Lastly, I think everyone put in 110% effort, so that’s what all the operators cost. That didn’t leave anything for me, unfortunately, but that’s okay. Also, because he decided to use the Paul White operator, Jason Brimhall gets 0%, and his 110% was given to Paul’s Switch operator post. I hope you’ve enjoyed this T-SQL Tuesday, and have learned something extra about Plan Operators. Keep your eye out for next month’s one by watching the Twitter Hashtag #tsql2sday, and why not contribute a post to the party? Big thanks to Adam Machanic as usual for starting all this. @rob_farley

Read the article

This November, Join Me in Stockholm and Amsterdam

- by Adam Machanic

Late last year, I was invited by Raoul Illyés, a SQL Server MVP from Denmark, to present a precon at the 2013 edition of SQLRally Nordic. I agreed and decided to skip the US PASS Summit this year and instead visit an area of Europe I've never seen before. A bonus came a while later when I learned that there is another SQLRally in Europe that same week: SQLRally Amsterdam. Things worked out in just the right way and today I'm happy to announce that I'll be speaking at both events, back-to-back. Should...(read more)

Read the article

Hello Operator, My Switch Is Bored

- by Paul White

This is a post for T-SQL Tuesday #43 hosted by my good friend Rob Farley. The topic this month is Plan Operators. I haven’t taken part in T-SQL Tuesday before, but I do like to write about execution plans, so this seemed like a good time to start. This post is in two parts. The first part is primarily an excuse to use a pretty bad play on words in the title of this blog post (if you’re too young to know what a telephone operator or a switchboard is, I hate you). The second part of the post looks at an invisible query plan operator (so to speak). 1. My Switch Is Bored Allow me to present the rare and interesting execution plan operator, Switch: Books Online has this to say about Switch: Following that description, I had a go at producing a Fast Forward Cursor plan that used the TOP operator, but had no luck. That may be due to my lack of skill with cursors, I’m not too sure. The only application of Switch in SQL Server 2012 that I am familiar with requires a local partitioned view: CREATE TABLE dbo.T1 (c1 int NOT NULL CHECK (c1 BETWEEN 00 AND 24)); CREATE TABLE dbo.T2 (c1 int NOT NULL CHECK (c1 BETWEEN 25 AND 49)); CREATE TABLE dbo.T3 (c1 int NOT NULL CHECK (c1 BETWEEN 50 AND 74)); CREATE TABLE dbo.T4 (c1 int NOT NULL CHECK (c1 BETWEEN 75 AND 99)); GO CREATE VIEW V1 AS SELECT c1 FROM dbo.T1 UNION ALL SELECT c1 FROM dbo.T2 UNION ALL SELECT c1 FROM dbo.T3 UNION ALL SELECT c1 FROM dbo.T4; Not only that, but it needs an updatable local partitioned view. We’ll need some primary keys to meet that requirement: ALTER TABLE dbo.T1 ADD CONSTRAINT PK_T1 PRIMARY KEY (c1); ALTER TABLE dbo.T2 ADD CONSTRAINT PK_T2 PRIMARY KEY (c1); ALTER TABLE dbo.T3 ADD CONSTRAINT PK_T3 PRIMARY KEY (c1); ALTER TABLE dbo.T4 ADD CONSTRAINT PK_T4 PRIMARY KEY (c1); We also need an INSERT statement that references the view. Even more specifically, to see a Switch operator, we need to perform a single-row insert (multi-row inserts use a different plan shape): INSERT dbo.V1 (c1) VALUES (1); And now…the execution plan: The Constant Scan manufactures a single row with no columns. The Compute Scalar works out which partition of the view the new value should go in. The Assert checks that the computed partition number is not null (if it is, an error is returned). The Nested Loops Join executes exactly once, with the partition id as an outer reference (correlated parameter). The Switch operator checks the value of the parameter and executes the corresponding input only. If the partition id is 0, the uppermost Clustered Index Insert is executed, adding a row to table T1. If the partition id is 1, the next lower Clustered Index Insert is executed, adding a row to table T2…and so on. In case you were wondering, here’s a query and execution plan for a multi-row insert to the view: INSERT dbo.V1 (c1) VALUES (1), (2); Yuck! An Eager Table Spool and four Filters! I prefer the Switch plan. My guess is that almost all the old strategies that used a Switch operator have been replaced over time, using things like a regular Concatenation Union All combined with Start-Up Filters on its inputs. Other new (relative to the Switch operator) features like table partitioning have specific execution plan support that doesn’t need the Switch operator either. This feels like a bit of a shame, but perhaps it is just nostalgia on my part, it’s hard to know. Please do let me know if you encounter a query that can still use the Switch operator in 2012 – it must be very bored if this is the only possible modern usage! 2. Invisible Plan Operators The second part of this post uses an example based on a question Dave Ballantyne asked using the SQL Sentry Plan Explorer plan upload facility. If you haven’t tried that yet, make sure you’re on the latest version of the (free) Plan Explorer software, and then click the Post to SQLPerformance.com button. That will create a site question with the query plan attached (which can be anonymized if the plan contains sensitive information). Aaron Bertrand and I keep a close eye on questions there, so if you have ever wanted to ask a query plan question of either of us, that’s a good way to do it. The problem The issue I want to talk about revolves around a query issued against a calendar table. The script below creates a simplified version and adds 100 years of per-day information to it: USE tempdb; GO CREATE TABLE dbo.Calendar ( dt date NOT NULL, isWeekday bit NOT NULL, theYear smallint NOT NULL, CONSTRAINT PK__dbo_Calendar_dt PRIMARY KEY CLUSTERED (dt) ); GO -- Monday is the first day of the week for me SET DATEFIRST 1; -- Add 100 years of data INSERT dbo.Calendar WITH (TABLOCKX) (dt, isWeekday, theYear) SELECT CA.dt, isWeekday = CASE WHEN DATEPART(WEEKDAY, CA.dt) IN (6, 7) THEN 0 ELSE 1 END, theYear = YEAR(CA.dt) FROM Sandpit.dbo.Numbers AS N CROSS APPLY ( VALUES (DATEADD(DAY, N.n - 1, CONVERT(date, '01 Jan 2000', 113))) ) AS CA (dt) WHERE N.n BETWEEN 1 AND 36525; The following query counts the number of weekend days in 2013: SELECT Days = COUNT_BIG(*) FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; It returns the correct result (104) using the following execution plan: The query optimizer has managed to estimate the number of rows returned from the table exactly, based purely on the default statistics created separately on the two columns referenced in the query’s WHERE clause. (Well, almost exactly, the unrounded estimate is 104.289 rows.) There is already an invisible operator in this query plan – a Filter operator used to apply the WHERE clause predicates. We can see it by re-running the query with the enormously useful (but undocumented) trace flag 9130 enabled: Now we can see the full picture. The whole table is scanned, returning all 36,525 rows, before the Filter narrows that down to just the 104 we want. Without the trace flag, the Filter is incorporated in the Clustered Index Scan as a residual predicate. It is a little bit more efficient than using a separate operator, but residual predicates are still something you will want to avoid where possible. The estimates are still spot on though: Anyway, looking to improve the performance of this query, Dave added the following filtered index to the Calendar table: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear) WHERE isWeekday = 0; The original query now produces a much more efficient plan: Unfortunately, the estimated number of rows produced by the seek is now wrong (365 instead of 104): What’s going on? The estimate was spot on before we added the index! Explanation You might want to grab a coffee for this bit. Using another trace flag or two (8606 and 8612) we can see that the cardinality estimates were exactly right initially: The highlighted information shows the initial cardinality estimates for the base table (36,525 rows), the result of applying the two relational selects in our WHERE clause (104 rows), and after performing the COUNT_BIG(*) group by aggregate (1 row). All of these are correct, but that was before cost-based optimization got involved :) Cost-based optimization When cost-based optimization starts up, the logical tree above is copied into a structure (the ‘memo’) that has one group per logical operation (roughly speaking). The logical read of the base table (LogOp_Get) ends up in group 7; the two predicates (LogOp_Select) end up in group 8 (with the details of the selections in subgroups 0-6). These two groups still have the correct cardinalities as trace flag 8608 output (initial memo contents) shows: During cost-based optimization, a rule called SelToIdxStrategy runs on group 8. It’s job is to match logical selections to indexable expressions (SARGs). It successfully matches the selections (theYear = 2013, is Weekday = 0) to the filtered index, and writes a new alternative into the memo structure. The new alternative is entered into group 8 as option 1 (option 0 was the original LogOp_Select): The new alternative is to do nothing (PhyOp_NOP = no operation), but to instead follow the new logical instructions listed below the NOP. The LogOp_GetIdx (full read of an index) goes into group 21, and the LogOp_SelectIdx (selection on an index) is placed in group 22, operating on the result of group 21. The definition of the comparison ‘the Year = 2013’ (ScaOp_Comp downwards) was already present in the memo starting at group 2, so no new memo groups are created for that. New Cardinality Estimates The new memo groups require two new cardinality estimates to be derived. First, LogOp_Idx (full read of the index) gets a predicted cardinality of 10,436. This number comes from the filtered index statistics: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH STAT_HEADER; The second new cardinality derivation is for the LogOp_SelectIdx applying the predicate (theYear = 2013). To get a number for this, the cardinality estimator uses statistics for the column ‘theYear’, producing an estimate of 365 rows (there are 365 days in 2013!): DBCC SHOW_STATISTICS (Calendar, theYear) WITH HISTOGRAM; This is where the mistake happens. Cardinality estimation should have used the filtered index statistics here, to get an estimate of 104 rows: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH HISTOGRAM; Unfortunately, the logic has lost sight of the link between the read of the filtered index (LogOp_GetIdx) in group 22, and the selection on that index (LogOp_SelectIdx) that it is deriving a cardinality estimate for, in group 21. The correct cardinality estimate (104 rows) is still present in the memo, attached to group 8, but that group now has a PhyOp_NOP implementation. Skipping over the rest of cost-based optimization (in a belated attempt at brevity) we can see the optimizer’s final output using trace flag 8607: This output shows the (incorrect, but understandable) 365 row estimate for the index range operation, and the correct 104 estimate still attached to its PhyOp_NOP. This tree still has to go through a few post-optimizer rewrites and ‘copy out’ from the memo structure into a tree suitable for the execution engine. One step in this process removes PhyOp_NOP, discarding its 104-row cardinality estimate as it does so. To finish this section on a more positive note, consider what happens if we add an OVER clause to the query aggregate. This isn’t intended to be a ‘fix’ of any sort, I just want to show you that the 104 estimate can survive and be used if later cardinality estimation needs it: SELECT Days = COUNT_BIG(*) OVER () FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; The estimated execution plan is: Note the 365 estimate at the Index Seek, but the 104 lives again at the Segment! We can imagine the lost predicate ‘isWeekday = 0’ as sitting between the seek and the segment in an invisible Filter operator that drops the estimate from 365 to 104. Even though the NOP group is removed after optimization (so we don’t see it in the execution plan) bear in mind that all cost-based choices were made with the 104-row memo group present, so although things look a bit odd, it shouldn’t affect the optimizer’s plan selection. I should also mention that we can work around the estimation issue by including the index’s filtering columns in the index key: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear, isWeekday) WHERE isWeekday = 0 WITH (DROP_EXISTING = ON); There are some downsides to doing this, including that changes to the isWeekday column may now require Halloween Protection, but that is unlikely to be a big problem for a static calendar table ;) With the updated index in place, the original query produces an execution plan with the correct cardinality estimation showing at the Index Seek: That’s all for today, remember to let me know about any Switch plans you come across on a modern instance of SQL Server! Finally, here are some other posts of mine that cover other plan operators: Segment and Sequence Project Common Subexpression Spools Why Plan Operators Run Backwards Row Goals and the Top Operator Hash Match Flow Distinct Top N Sort Index Spools and Page Splits Singleton and Range Seeks Bitmaps Hash Join Performance Compute Scalar © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article

T-SQL Tuesday: What kind of Bookmark are you using?

- by Kalen Delaney

I’m glad there is no minimum length requirement for T-SQL Tuesday blog posts , because this one will be short. I was in the classroom for almost 11 hours today, and I need to be back tomorrow morning at 7:30. Way long ago, back in SQL 2000 (or was it earlier?) when a query indicated that SQL Server was going to use a nonclustered index to get row pointers, and then look up those rows in the underlying table, the plan just had a very linear look to it. The operator that indicated going from the nonclustered...(read more)

Read the article

Spooling in SQL execution plans

- by Rob Farley

Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

Read the article

Are We Losing a Standard (Edition) Data Recovery Technology?

- by AllenMWhite

One of the coolest technologies Microsoft released with SQL Server 2005 was Database Mirroring, which provided the ability to have a failover copy of a database on another SQL Server instance, and have the ability to automatically failover to that copy should a problem occur with the primary database. What was even cooler was that this new technology was available on Standard Edition! Mom and Pop shops could afford to implement a high availability solution without paying an extra tens of thousands...(read more)

Read the article

Help Me Help You Fix That

- by BuckWoody

If you've been redirected here because you posted on a forum, or asked a question in an e-mail, the person wanted you to know how to get help quickly from a group of folks who are willing to do so - but whose time is valuable. You need to put a little effort into the question first to get others to assist. This is how to do that. It will only take you a moment to read... 1. State the problem succinctly in the title When an e-mail thread starts, or a forum post is the "head" of the conversation, you'll attract more helpers by using a descriptive headline than a vague one. This: "Driver for Epson Line Printer Not Installing on Operating System XYZ" Not this: "Can't print - PLEASE HELP" 2. Explain the Error Completely Make sure you include all pertinent information in the request. More information is better, there's almost no way to add too much data to the discussion. What you were doing, what happened, what you saw, the error message, visuals, screen shots, whatever you can include. This: "I'm getting error '5203 - Driver not compatible with Operating System since about 25 years ago' in a message box on the screen when I tried to run the SETUP.COM file from my older computer. It was a 1995 Compaq Proliant and worked correctly there.." Not this: "I get an error message in a box. It won't install." 3. Explain what you have done to research the problem If the first thing you do is ask a question without doing any research, you're lazy, and no one wants to help you. Using one of the many fine search engines you can most always find the answer to your problem. Sometimes you can't. Do yourself a favor - open a notepad app, and paste the URL's as you look them up. If you get your answer, don't save the note. If you don't get an answer, send the list along with the problem. It will show that you've tried, and also keep people from sending you links that you've already checked. This: "I read the fine manual, and it doesn't mention Operating System XYZ for some reason. Also, I checked the following links, but the instructions there didn't fix the problem: " Not this: <NULL> 4. Say "Please" and "Thank You" Remember, you're asking for help. No one owes you their valuable time. Ask politely, don't pester, endure the people who are rude to you, and when your question is answered, respond back to the thread or e-mail with a thank you to close it out. It helps others that have your same problem know that this is the correct answer. This: "I could really use some help here - if you have any pointers or things to try, I'd appreciate it." Not this: "I really need this done right now - why are there no responses?" This: "Thanks for those responses - that last one did the trick. Turns out I needed a new printer anyway, didn't realize they were so inexpensive now." Not this: <NULL> There are a lot of motivated people that will help you. Help them do that.

Read the article

Jetzt geht’s los - speaking in Germany!

- by Hugo Kornelis

It feels just like yesterday that I went to Munich for the very first German edition of SQL Saturday – and it was a great event. An agenda that was packed with three tracks of great sessions, and lots of fun with the organization, attendees, and other speakers. That was such a great time that I didn’t have to hesitate long before deciding that I wanted to repeat this event this year. Especially when I heard that it will be held in Rheinland, on July 13 – that is a distance I can travel by car! The...(read more)

Read the article

Cloud Computing Architecture Patterns: Don’t Focus on the Client

- by BuckWoody

Normally I try to put topics in the positive in other words "Do this" not "Don't do that". Sometimes its clearer to focus on what *not* to do. Popular development processes often start with screen mockups, or user input descriptions. In a scale-out pattern like Cloud Computing on Windows Azure, that's the wrong place to start. Start with the Data Instead, I recommend that you start with the data that a process requires. That data might be temporary or persisted, but starting with the data and its requirements helps to define not only the storage engine you need but also drives everything from security to the integrity of the application. For instance, assume the requirements show that the user must enter their phone number, and that this datum is used in a contact management system further down the application chain. For that datum, you can determine what data type you need (U.S. only or International?) the security requirements, whether it needs ACID compliance, how it will be searched, indexed and so on. From one small data point you can extrapolate out your options for storing and processing the data. Here's the interesting part, which begins to break the patterns that we've used for decades: all of the data doesn't have the same requirements. The phone number might be best suited for a list, or an element, or a string, with either BASE or ACID requirements, based on how it is used. That means we don't have to dump everything into XML, an RDBMS, a NoSQL engine, or a flat file exclusively. In fact, one record might use all of those depending on the use-case requirements. Next Is Data Management With the data defined, we can move on to how to store the data. Again, the requirements now dictate whether we need a full relational calculus or set-based operations, or we can choose another method based on the requirements for the data. And breaking another pattern its OK to store in more than once, in more than one location. We do this all the time for reporting systems and Business Intelligence systems, so this is a pattern we need to think about even for OLTP data. Move to Data Transport How does the data get around? We can use a connection-based method, sending the data along a transport to the storage engine, but in some cases we may want to use a cache, a queue, the Service Bus, or Complex Event Processing. Finally, Data Processing Most RDBMS engines, NoSQL, and certainly Big Data engines not only store data, but can process and manipulate it as well. Its doubtful that you'll calculate that phone number right? Well, if you're the phone company, you most certainly will. And so we see that even once we've chosen the data type, storage and engine, the same element can have different computing requirements based on how it is used. Sure, We Need A Front-End At Some Point Not all data is entered by human hands in fact most data isn't. We don't really need a Graphical User Interface (GUI) we need some way for a GUI to get data into and out of the systems listed earlier. But when we do need to allow users to enter or examine data, that should be left to the GUI that best fits the device the user has. Ever tried to use an application designed for a web browser on a phone? Or one designed for a tablet on a phone? Its usually quite painful. The siren song of "We'll just write one interface for all devices" is strong, and has beguiled many an unsuspecting architect. But they just don't work out. Instead, focus on the data, its transport and processing. Create API calls or a message system that allows for resilient transport to the device or interface, and let it do what it does best. References Microsoft Architecture Journal: http://msdn.microsoft.com/en-us/architecture/bb410935.aspx Patterns and Practices: http://msdn.microsoft.com/en-us/library/ff921345.aspx Windows Azure iOS, Android, Windows 8 Mobile Devices SDK: http://www.windowsazure.com/en-us/develop/mobile/tutorials/get-started-ios/ Windows Azure Facebook SDK: http://ntotten.com/2013/03/14/using-windows-azure-mobile-services-with-the-facebook-sdk-for-windows-phone/

Read the article

Why We Write #6-An Interview with Kevin Kline

- by drsql

Wow, so far in this series, I have interviewed some very good friends, and some truly excellent writers (and usually both), but today, following on the heels of Jason Strate , we are going to hit someone whose name is synonymous with community, a person who really needs no introduction. According to Bing, Kevin Kline ( @kekline ) is the most important Kevin Kline on Twitter (though it clearly could be due to my typical searches, I am giving him the benefit of the doubt… here try it yourself: http://www.bing.com/search?q=kevin+kline+twitter...(read more)

Read the article

Cross-Domain calls using JavaScript in SharePoint Apps

- by Sahil Malik

SharePoint, WCF and Azure Trainings: more information Sounds simple enough right? You’ve probably done $.ajax, and jsonp? Yeah all that doesn’t work in SharePoint. The main reason being, those calls need to work under the app’s credentials. So instead here is what a SharePoint app does, It downloads a file called ~hostweburl/_layouts/15/SPRequestExecutor.js. This file creates an IFrame in your page which then downloads a file called ~appweburl/_layouts/15/AppWebproxy.aspx Then all calls that look like the below, are routed via AppWebProxy and run on the server under the apps identity. 1: var executor = new SP.RequestExecutor(this.appweburl); 2: var url = this.appweburl + "/_api/SP.AppContextSite(@target)/web?" + "@target='" + this.hostweburl + Read full article ....

Read the article

The right way to find a SPUser in SharePoint 2013

- by Sahil Malik

SharePoint, WCF and Azure Trainings: more information Obvious stuff out of the way, SharePoint 2013 is claims and claims only. If you’re still pimping classic windows identities, you’re a fool. But this creates an interesting wrinkle. How the hell is one supposed to find a SPUser? This, especially given that a user id now looks like this - i:0#.w|ws\administrator .. all of those have a meaning .. i stands for identity 0 is the zero’th registered claims provider w before the pipe is windows and after pipe is the final username. What if I had a hotmail account called ws\administrator? You see, browsing through web.SiteUsers, is no longer enough. Not only is it error prone, it won’t work for any other identity type besides Windows. So what is a poor SharePoint developer to do? Easy. Use the cod below instead, Read full article ....

Read the article

Reading Excel using OpenXML

public DataTable ReadDataFromExcel() { string filePath = @"c:/temp/temp.xlsx"; using (SpreadsheetDocument LobjDocument = SpreadsheetDocument.Open(filePath, false)) { WorkbookPart LobjWorkbookPart = LobjDocument.WorkbookPart; Sheet LobjSheetToImport = LobjWorkbookPart.Workbook.Descendants<Sheet>().First<Sheet>(); WorksheetPart LobjWorksheetPart = (WorksheetPart)(LobjWorkbookPart.GetPartById(LobjSheetToImport.Id)); SheetData LobjSheetData = LobjWorksheetPart.Worksheet.Elements<SheetData>().First(); //Read only the data rows and skip all the header rows. int LiRowIterator = 1; // for progress bar int LiTotal = LobjSheetData.Elements<Row>().Count() - MobjImportMapper.HeaderRowIndex; // ================= foreach (Row LobjRowItem in LobjSheetData.Elements<Row>().Skip(6)) { DataRow LdrDataRow = LdtExcelData.NewRow(); int LiColumnIndex = 0; int LiHasData = 0; LdrDataRow[LiColumnIndex] = LobjRowItem.RowIndex; //LiRowIterator; LiColumnIndex++; //TODO: handle restriction of column range. foreach (Cell LobjCellItem in LobjRowItem.Elements<Cell>().Where(PobjCell => ImportHelper.GetColumnIndexFromExcelColumnName(ImportHelper.GetColumnName(PobjCell.CellReference)) <= MobjImportMapper.LastColumnIndex)) { // Gets the column index of the cell with data int LiCellColumnIndex = 10; if (LiColumnIndex < LiCellColumnIndex) { do { LdrDataRow[LiColumnIndex] = string.Empty; LiColumnIndex++; } while (LiColumnIndex < LiCellColumnIndex); } string LstrCellValue = LobjCellItem.InnerText; if (LobjCellItem.DataType != null) { switch (LobjCellItem.DataType.Value) { case CellValues.SharedString: var LobjStringTable = LobjWorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); DocumentFormat.OpenXml.OpenXmlElement LXMLElment = null; string LstrXMLString = String.Empty; if (LobjStringTable != null) { LstrXMLString = LobjStringTable.SharedStringTable.ElementAt(int.Parse(LstrCellValue, CultureInfo.InvariantCulture)).InnerXml; if (LstrXMLString.IndexOf("<x:rPh", StringComparison.CurrentCulture) != -1) { LXMLElment = LobjStringTable.SharedStringTable.ElementAt(int.Parse(LstrCellValue, CultureInfo.InvariantCulture)).FirstChild; LstrCellValue = LXMLElment.InnerText; } else { LstrCellValue = LobjStringTable.SharedStringTable.ElementAt(int.Parse(LstrCellValue, CultureInfo.InvariantCulture)).InnerText; } } break; default: break; } } LdrDataRow[LiColumnIndex] = LstrCellValue.Trim(); if (!string.IsNullOrEmpty(LstrCellValue)) LiHasData++; LiColumnIndex++; } if (LiHasData > 0) { LiRowIterator++; LdtExcelData.Rows.Add(LdrDataRow); } } } return LdtExcelData; } span.fullpost {display:none;}

Read the article

Reading XML Content

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Linq; using System.Diagnostics; using System.Threading; using System.Xml; using System.Reflection; namespace XMLReading { class Program { static void Main(string[] args) { string fileName = @"C:\temp\t.xml"; List<EmergencyContactXMLDTO> emergencyContacts = new XmlReader<EmergencyContactXMLDTO, EmergencyContactXMLDTOMapper>().Read(fileName); foreach (var item in emergencyContacts) { Console.WriteLine(item.FileNb); } } } public class XmlReader<TDTO, TMAPPER> where TDTO : BaseDTO, new() where TMAPPER : PCPWXMLDTOMapper, new() { public List<TDTO> Read(String fileName) { XmlTextReader reader = new XmlTextReader(fileName); List<TDTO> emergencyContacts = new List<TDTO>(); while (true) { TMAPPER mapper = new TMAPPER(); bool isFound = SeekElement(reader, mapper.GetMainXMLTagName()); if (!isFound) break; TDTO dto = new TDTO(); foreach (var propertyKey in mapper.GetPropertyXMLMap()) { String dtoPropertyName = propertyKey.Key; String xmlPropertyName = propertyKey.Value; SeekElement(reader, xmlPropertyName); SetValue(dto, dtoPropertyName, reader.ReadElementString()); } emergencyContacts.Add(dto); } return emergencyContacts; } private void SetValue(Object dto, String propertyName, String value) { PropertyInfo prop = dto.GetType().GetProperty(propertyName, BindingFlags.Public | BindingFlags.Instance); prop.SetValue(dto, value, null); } private bool SeekElement(XmlTextReader reader, String elementName) { while (reader.Read()) { XmlNodeType nodeType = reader.MoveToContent(); if (nodeType != XmlNodeType.Element) { continue; } if (reader.Name == elementName) { return true; } } return false; } } public class BaseDTO { } public class EmergencyContactXMLDTO : BaseDTO { public string FileNb { get; set; } public string ContactName { get; set; } public string ContactPhoneNumber { get; set; } public string Relationship { get; set; } public string DoctorName { get; set; } public string DoctorPhoneNumber { get; set; } public string HospitalName { get; set; } } public interface PCPWXMLDTOMapper { Dictionary<string, string> GetPropertyXMLMap(); String GetMainXMLTagName(); } public class EmergencyContactXMLDTOMapper : PCPWXMLDTOMapper { public Dictionary<string, string> GetPropertyXMLMap() { return new Dictionary<string, string> { { "FileNb", "XFileNb" }, { "ContactName", "XContactName"}, { "ContactPhoneNumber", "XContactPhoneNumber" }, { "Relationship", "XRelationship" }, { "DoctorName", "XDoctorName" }, { "DoctorPhoneNumber", "XDoctorPhoneNumber" }, { "HospitalName", "XHospitalName" }, }; } public String GetMainXMLTagName() { return "EmergencyContact"; } } } span.fullpost {display:none;}

Read the article

Reading Excel using ClosedXML

I have used closedXML api to read the excel. Here is how you do it. Statistically, this performs better than OpenXML. public DataTable ReadDataFromExcelUsingClosedXML() { string filePath ="@c:/temp/example.xlsx"; var LobjWorkbook = new XLWorkbook(filePath); var LobjWorksheet = LobjWorkbook.Worksheets.First(); var LobjFullRange = LobjWorksheet.RangeUsed(); var LobjUsedRange = LobjWorksheet.Range(MobjImportMapper.HeaderRowIndex + 1, 1, LobjFullRange.RangeAddress.LastAddress.RowNumber, LobjFullRange.RangeAddress.LastAddress.ColumnNumber); var LiNumberOfcolumnsInTheExcel = LobjUsedRange.ColumnCount(); // for progress bar int LiAggregateRowCounter = MobjImportMapper.HeaderRowIndex; int LiTotalNumberOfRows = LobjWorksheet.RowCount() - LiAggregateRowCounter; int LiPercentage = 0; foreach (var LobjRow in LobjUsedRange.RowsUsed()) { int LiTemp = 0; object[] LobjrowData = new object[LiNumberOfcolumnsInTheExcel + 1]; LobjrowData[LiTemp] = LobjRow.RangeAddress.FirstAddress.RowNumber; LiTemp++; LobjRow.Cells().ForEach(PobjCell => LobjrowData[LiTemp++] = PobjCell.Value); LdtExcelData.Rows.Add(LobjrowData); // for progress bar LiPercentage = ((100 * LiAggregateRowCounter / LiTotalNumberOfRows) / 4) * 3; if (LiPercentage > 5) PobjBackgoundWorker.ReportProgress(LiPercentage); LiAggregateRowCounter++; // ===================== } return LdtExcelData; } span.fullpost {display:none;}

Read the article

Different Approaches of Entity Framework

Entity Framework provides three different approaches to deal with the model, and each one has its own pros and cons. Ambily Kavumkal Kamalasanan discusses the advantages of the Model, Database, and Code First approaches to modeling in Entity Framework 5.0. Entity Framework still has its share of issues and is not widely accepted yet - but through contributing to its ongoing development the community can make it more stable and increase its adoption.

Read the article

Windows Azure from a Data Perspective

Before creating a data application in Windows Azure, it is important to make choices based on the type of data you have, as well as the security and the business requirements. There are a wide range of options, because Windows Azure has intrinsic data storage, completely separate from SQL Azure, that is highly available and replicated. Your data requirements are likely to dictate the type of data storage options you choose.

Read the article

Learn Many Languages

- by Phil Factor

Around twenty-five years ago, I was trying to solve the problem of recruiting suitable developers for a large business. I visited the local University (it was a Technical College then). My mission was to remind them that we were a large, local employer of technical people and to suggest that, as they were in the business of educating young people for a career in IT, we should work together. I anticipated a harmonious chat where we could suggest to them the idea of mentioning our name to some of their graduates. It didn’t go well. The academic staff displayed a degree of revulsion towards the whole topic of IT in the world of commerce that surprised me; tweed met charcoal-grey, trainers met black shoes. However, their antipathy to commerce was something we could have worked around, since few of their graduates were destined for a career as university lecturers. They asked me what sort of language skills we needed. I tried ducking the invidious task of naming computer languages, since I wanted recruits who were quick to adapt and learn, with a broad understanding of IT, including development methodologies, technologies, and data. However, they pressed the point and I ended up saying that we needed good working knowledge of C and BASIC, though FORTRAN and COBOL were, at the time, still useful. There was a ghastly silence. It was as if I’d recommended the beliefs and practices of the Bogomils of Bulgaria to a gathering of Cardinals. They stared at me severely, like owls, until the head of department broke the silence, informing me in clipped tones that they taught only Modula 2. Now, I wouldn’t blame you if at this point you hurriedly had to look up ‘Modula 2′ on Wikipedia. Based largely on Pascal, it was a specialist language for embedded systems, but I’ve never ever come across it in a commercial business application. Nevertheless, it was an excellent teaching language since it taught modules, scope control, multiprogramming and the advantages of encapsulating a set of related subprograms and data structures. As long as the course also taught how to transfer these skills to other, more useful languages, it was not necessarily a problem. I said as much, but they gleefully retorted that the biggest local employer, a defense contractor specializing in Radar and military technology, used nothing but Modula 2. “Why teach any other programming language when they will be using Modula 2 for all their working lives?” said a complacent lecturer. On hearing this, I made my excuses and left. There could be no meeting of minds. They were providing training in a specific computer language, not an education in IT. Twenty years later, I once more worked nearby and regularly passed the long-deserted ‘brownfield’ site of the erstwhile largest local employer; the end of the cold war had led to lean times for defense contractors. A digger was about to clear the rubble of the long demolished factory along with the accompanying growth of buddleia and thistles, in order to lay the infrastructure for ‘affordable housing’. Modula 2 was a distant memory. Either those employees had short working lives or they’d retrained in other languages. The University, by contrast, was thriving, but I wondered if their erstwhile graduates had ever cursed the narrow specialization of their training in IT, as they struggled with the unexpected variety of their subsequent careers.

Read the article

Working With Extended Events

- by Fatherjack

SQL Server 2012 has made working with Extended Events (XE) pretty simple when it comes to what sessions you have on your servers and what options you have selected and so forth but if you are like me then you still have some SQL Server instances that are 2008 or 2008 R2. For those servers there is no built-in way to view the Extended Event sessions in SSMS. I keep coming up against the same situations – Where are the xel log files? What events, actions or predicates are set for the events on the server? What sessions are there on the server already? I got tired of this being a perpetual question and wrote some TSQL to save as a snippet in SQL Prompt so that these details are permanently only a couple of clicks away. First, some history. If you just came here for the code skip down a few paragraphs and it’s all there. If you want a little time to reminisce about SQL Server then stick with me through the next paragraph or two. We are in a bit of a cross-over period currently, there are many versions of SQL Server but I would guess that SQL Server 2008, 2008 R2 and 2012 comprise the majority of installations. With each of these comes a set of management tools, of which SQL Server Management Studio (SSMS) is one. In 2008 and 2008 R2 Extended Events made their first appearance and there was no way to work with them in the SSMS interface. At some point the Extended Events guru Jonathan Kehayias (http://www.sqlskills.com/blogs/jonathan/) created the SQL Server 2008 Extended Events SSMS Addin which is really an excellent tool to ease XE session administration. This addin will install in SSMS 2008 or 2008R2 but not SSMS 2012. If you use a compatible version of SSMS then I wholly recommend downloading and using it to make your work with XE much easier. If you have SSMS 2012 installed, and there is no reason not to as it will let you work with all versions of SQL Server, then you cannot install this addin. If you are working with SQL Server 2012 then SSMS 2012 has built in functionality to manage XE sessions – this functionality does not apply for 2008 or 2008 R2 instances though. This means you are somewhat restricted and have to use TSQL to manage XE sessions on older versions of SQL Server. OK, those of you that skipped ahead for the code, you need to start from here: So, you are working with SSMS 2012 but have a SQL Server that is an earlier version that needs an XE session created or you think there is a session created but you aren’t sure, or you know it’s there but can’t remember if it is running and where the output is going. How do you find out? Well, none of the information is hidden as such but it is a bit of a wrangle to locate it and it isn’t a lot of code that is unlikely to remain in your memory. I have created two pieces of code. The first examines the SYS.Server_Event_… management views in combination with the SYS.DM_XE_… management views to give the name of all sessions that exist on the server, regardless of whether they are running or not and two pieces of TSQL code. One piece will alter the state of the session: if the session is running then the code will stop the session if executed and vice versa. The other piece of code will drop the selected session. If the session is running then the code will stop it first. Do not execute the DROP code unless you are sure you have the Create code to hand. It will be dropped from the server without a second chance to change your mind. /**************************************************************/ /*** To locate and describe event sessions on a server ***/ /*** ***/ /*** Generates TSQL to start/stop/drop sessions ***/ /*** ***/ /*** Jonathan Allen - @fatherjack ***/ /*** June 2013 ***/ /*** ***/ /**************************************************************/ SELECT [EES].[name] AS [Session Name - all sessions] , CASE WHEN [MXS].[name] IS NULL THEN ISNULL([MXS].[name], 'Stopped') ELSE 'Running' END AS SessionState , CASE WHEN [MXS].[name] IS NULL THEN ISNULL([MXS].[name], 'ALTER EVENT SESSION [' + [EES].[name] + '] ON SERVER STATE = START;') ELSE 'ALTER EVENT SESSION [' + [EES].[name] + '] ON SERVER STATE = STOP;' END AS ALTER_SessionState , CASE WHEN [MXS].[name] IS NULL THEN ISNULL([MXS].[name], 'DROP EVENT SESSION [' + [EES].[name] + '] ON SERVER; -- This WILL drop the session. It will no longer exist. Don't do it unless you are certain you can recreate it if you need it.') ELSE 'ALTER EVENT SESSION [' + [EES].[name] + '] ON SERVER STATE = STOP; ' + CHAR(10) + '-- DROP EVENT SESSION [' + [EES].[name] + '] ON SERVER; -- This WILL stop and drop the session. It will no longer exist. Don't do it unless you are certain you can recreate it if you need it.' END AS DROP_Session FROM [sys].[server_event_sessions] AS EES LEFT JOIN [sys].[dm_xe_sessions] AS MXS ON [EES].[name] = [MXS].[name] WHERE [EES].[name] NOT IN ( 'system_health', 'AlwaysOn_health' ) ORDER BY SessionState GO I have excluded the system_health and AlwaysOn sessions as I don’t want to accidentally execute the drop script for these sessions that are created as part of the SQL Server installation. It is possible to recreate the sessions but that is a whole lot of aggravation I’d rather avoid. The second piece of code gathers details of running XE sessions only and provides information on the Events being collected, any predicates that are set on those events, the actions that are set to be collected, where the collected information is being logged and if that logging is to a file target, where that file is located. /**********************************************/ /*** Running Session summary ***/ /*** ***/ /*** Details key values of XE sessions ***/ /*** that are in a running state ***/ /*** ***/ /*** Jonathan Allen - @fatherjack ***/ /*** June 2013 ***/ /*** ***/ /**********************************************/ SELECT [EES].[name] AS [Session Name - running sessions] , [EESE].[name] AS [Event Name] , COALESCE([EESE].[predicate], 'unfiltered') AS [Event Predicate Filter(s)] , [EESA].[Action] AS [Event Action(s)] , [EEST].[Target] AS [Session Target(s)] , ISNULL([EESF].[value], 'No file target in use') AS [File_Target_UNC] -- select * FROM [sys].[server_event_sessions] AS EES INNER JOIN [sys].[dm_xe_sessions] AS MXS ON [EES].[name] = [MXS].[name] INNER JOIN [sys].[server_event_session_events] AS [EESE] ON [EES].[event_session_id] = [EESE].[event_session_id] LEFT JOIN [sys].[server_event_session_fields] AS EESF ON ( [EES].[event_session_id] = [EESF].[event_session_id] AND [EESF].[name] = 'filename' ) CROSS APPLY ( SELECT STUFF(( SELECT ', ' + sest.name FROM [sys].[server_event_session_targets] AS SEST WHERE [EES].[event_session_id] = [SEST].[event_session_id] FOR XML PATH('') ), 1, 2, '') AS [Target] ) AS EEST CROSS APPLY ( SELECT STUFF(( SELECT ', ' + [sesa].NAME FROM [sys].[server_event_session_actions] AS sesa WHERE [sesa].[event_session_id] = [EES].[event_session_id] FOR XML PATH('') ), 1, 2, '') AS [Action] ) AS EESA WHERE [EES].[name] NOT IN ( 'system_health', 'AlwaysOn_health' ) /*Optional to exclude 'out-of-the-box' traces*/ I hope that these scripts are useful to you and I would be obliged if you would keep my name in the script comments. I have no problem with you using it in production or personal circumstances, however it has no warranty or guarantee. Don’t use it unless you understand it and are happy with what it is going to do. I am not ever responsible for the consequences of executing this script on your servers.

Daily Archives

Articles indexed Monday June 24 2013

Page 10/22 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

- by AaronBertrand

- by Argenis

- by jchang

- by Paul White

- by John Paul Cook

- by Hugo Kornelis

- by Rob Farley

- by Adam Machanic

- by Paul White

- by Kalen Delaney

- by Rob Farley

- by AllenMWhite

- by BuckWoody

- by Hugo Kornelis

- by BuckWoody

- by drsql

- by Sahil Malik

- by Sahil Malik

- by Phil Factor

- by Fatherjack

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >