Search Results

Search found 8340 results on 334 pages for 'merge join'.

Page 1/334 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • In SQL, if we rename INNER JOIN as INTERSECT JOIN, LEFT OUTER JOIN as LEFT UNION JOIN, and FULL OUTE

    - by Jian Lin
    In SQL, the name Join gives an idea of "merging" or a sense of "union", making something bigger. But in fact, as in the other post http://stackoverflow.com/questions/2706051/in-sql-a-join-is-actually-an-intersection-and-it-is-also-a-linkage-or-a-sidew it turns out that a Join (Inner Join) is actually an Intersection. So if we think of Join = Inner Join = Intersect Join Left Outer Join = Left Union Join Full Outer Join = Full Union Join = Union Join then we always get a feel of what's happening, and maybe never forget what they are easily. In a way, we can think of Intersect as "making it less", therefore it is excluding something. That's why the name "Join" won't go with the idea of "Intersect". But in fact, both Intersect and Union can be thought of as: Union: bringing something together and merge them unconditionally. Intersect: bringing something together and merge them based on some condition. so the "bringing something together" is probably what "Join" is all about. It is like, Intersection is a "half glass of water" -- we can thinking of it as "excluding something" or as "bringing something together and accepting the common ones". So if the word "Intersect Join" is used, maybe a clear picture is there, and "Union Join" can be a clear picture too. Maybe the word "Inner Join" and "Outer Join" is very clear when we use SQL a lot. Somehow, the word "Outer" tends to give a feeling that it is "outside" and excluding something rather than a "Union".

    Read the article

  • If Inner Join can be thought of as Cross Join but filtering out the records satisfying the condition

    - by Jian Lin
    If an Inner Join can be thought of as a cross join and then getting the records that satisfy the condition, then a LEFT OUTER JOIN can be thought of as that, plus ONE record on the left table that doesn't satisfy the condition. In other words, it is not a cross join that "goes easy" on the left records (even when the condition is not satisfied), because then the left record can appear many times (as many as how many records there are in the right table). So the LEFT OUTER JOIN is the Cross JOIN with the records satisfying the condition, plus ONE record from the LEFT TABLE that doesn't satisfy the condition.

    Read the article

  • Thinking of an Inner Join as a Cross Join and then satisfying some condition(s)?

    - by Jian Lin
    It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then satisfying some condition(s)? Because the equi-join can be obvious, but the non-equi-join can be a bit confusing. But if we always use the Cross Join, and then filter out the ones satisfying the condition, then we get the resulting table. In other words, we can always analyze it by using the first record on the left table, and then go through every single records on the right, and then repeat that for 2nd record on the left, and for the 3rd, 4th, ... etc. So in our mind, we can analyze it using this way, and it is like O(n^2), although what happens in the DBMS maybe that it is a lot faster (when an index is present). Is there another good way to think of it besides this method?

    Read the article

  • It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then sa

    - by Jian Lin
    It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then satisfying some condition(s)? Because the equi-join can be obvious, but the non-equi-join can be a bit confusing. But if we always use the Cross Join, and then filter out the ones satisfying the condition, then we get the resulting table. In other words, we can always analyze it by using the first record on the left table, and then go through every single records on the right, and then repeat that for 2nd record on the left, and for the 3rd, 4th, ... etc. So in our mind, we can analyze it using this way, and it is like O(n^2), although what happens in the DBMS maybe that it is a lot faster (when an index is present). Is there another good way to think of it besides this method?

    Read the article

  • SQL SERVER – Merge Operations – Insert, Update, Delete in Single Execution

    - by pinaldave
    This blog post is written in response to T-SQL Tuesday hosted by Jorge Segarra (aka SQLChicken). I have been very active using these Merge operations in my development. However, I have found out from my consultancy work and friends that these amazing operations are not utilized by them most of the time. Here is my attempt to bring the necessity of using the Merge Operation to surface one more time. MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted. One of the most important advantages of MERGE statement is that the entire data are read and processed only once. In earlier versions, three different statements had to be written to process three different activities (INSERT, UPDATE or DELETE); however, by using MERGE statement, all the update activities can be done in one pass of database table. I have written about these Merge Operations earlier in my blog post over here SQL SERVER – 2008 – Introduction to Merge Statement – One Statement for INSERT, UPDATE, DELETE. I was asked by one of the readers that how do we know that this operator was doing everything in single pass and was not calling this Merge Operator multiple times. Let us run the same example which I have used earlier; I am listing the same here again for convenience. --Let’s create Student Details and StudentTotalMarks and inserted some records. USE tempdb GO CREATE TABLE StudentDetails ( StudentID INTEGER PRIMARY KEY, StudentName VARCHAR(15) ) GO INSERT INTO StudentDetails VALUES(1,'SMITH') INSERT INTO StudentDetails VALUES(2,'ALLEN') INSERT INTO StudentDetails VALUES(3,'JONES') INSERT INTO StudentDetails VALUES(4,'MARTIN') INSERT INTO StudentDetails VALUES(5,'JAMES') GO CREATE TABLE StudentTotalMarks ( StudentID INTEGER REFERENCES StudentDetails, StudentMarks INTEGER ) GO INSERT INTO StudentTotalMarks VALUES(1,230) INSERT INTO StudentTotalMarks VALUES(2,255) INSERT INTO StudentTotalMarks VALUES(3,200) GO -- Select from Table SELECT * FROM StudentDetails GO SELECT * FROM StudentTotalMarks GO -- Merge Statement MERGE StudentTotalMarks AS stm USING (SELECT StudentID,StudentName FROM StudentDetails) AS sd ON stm.StudentID = sd.StudentID WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE WHEN MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25 WHEN NOT MATCHED THEN INSERT(StudentID,StudentMarks) VALUES(sd.StudentID,25); GO -- Select from Table SELECT * FROM StudentDetails GO SELECT * FROM StudentTotalMarks GO -- Clean up DROP TABLE StudentDetails GO DROP TABLE StudentTotalMarks GO The Merge Join performs very well and the following result is obtained. Let us check the execution plan for the merge operator. You can click on following image to enlarge it. Let us evaluate the execution plan for the Table Merge Operator only. We can clearly see that the Number of Executions property suggests value 1. Which is quite clear that in a single PASS, the Merge Operation completes the operations of Insert, Update and Delete. I strongly suggest you all to use this operation, if possible, in your development. I have seen this operation implemented in many data warehousing applications. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Joins, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Merge

    Read the article

  • SQL – Difference Between INNER JOIN and JOIN

    - by Pinal Dave
    Here is the follow up question to my earlier question SQL – Difference between != and Operator <> used for NOT EQUAL TO Operation. There was a pretty good discussion about this subject earlier and lots of people participated with their opinion. Though the answer was very simple but the conversation was indeed delightful and was indeed very informative. In this blog post I have another following up question to all of you. What is the difference between INNER JOIN and JOIN? If you are working with database you will find developers use above both the kinds of the joins in their SQL Queries. Here is the quick example of the same. Query using INNER JOIN SELECT * FROM Table1 INNER JOIN  Table2 ON Table1.Col1 = Table2.Col1 Query using JOIN SELECT * FROM Table1 JOIN  Table2 ON Table1.Col1 = Table2.Col1 The question is what is the difference between above two syntax. Here is the answer – They are equal to each other. There is absolutely no difference between them. They are equal in performance as well as implementation. JOIN is actually shorter version of INNER JOIN. Personally I prefer to write INNER JOIN because it is much cleaner to read and it avoids any confusion if there is related to JOIN. For example if users had written INNER JOIN instead of JOIN there would have been no confusion in mind and hence there was no need to have original question. Here is the question back to you - Which one of the following syntax do you use when you are inner joining two tables – INNER JOIN or JOIN? and Why? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Joins, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • MERGE Bug with Filtered Indexes

    - by Paul White
    A MERGE statement can fail, and incorrectly report a unique key violation when: The target table uses a unique filtered index; and No key column of the filtered index is updated; and A column from the filtering condition is updated; and Transient key violations are possible Example Tables Say we have two tables, one that is the target of a MERGE statement, and another that contains updates to be applied to the target.  The target table contains three columns, an integer primary key, a single character alternate key, and a status code column.  A filtered unique index exists on the alternate key, but is only enforced where the status code is ‘a’: CREATE TABLE #Target ( pk integer NOT NULL, ak character(1) NOT NULL, status_code character(1) NOT NULL,   PRIMARY KEY (pk) );   CREATE UNIQUE INDEX uq1 ON #Target (ak) INCLUDE (status_code) WHERE status_code = 'a'; The changes table contains just an integer primary key (to identify the target row to change) and the new status code: CREATE TABLE #Changes ( pk integer NOT NULL, status_code character(1) NOT NULL,   PRIMARY KEY (pk) ); Sample Data The sample data for the example is: INSERT #Target (pk, ak, status_code) VALUES (1, 'A', 'a'), (2, 'B', 'a'), (3, 'C', 'a'), (4, 'A', 'd');   INSERT #Changes (pk, status_code) VALUES (1, 'd'), (4, 'a');          Target                     Changes +-----------------------+    +------------------+ ¦ pk ¦ ak ¦ status_code ¦    ¦ pk ¦ status_code ¦ ¦----+----+-------------¦    ¦----+-------------¦ ¦  1 ¦ A  ¦ a           ¦    ¦  1 ¦ d           ¦ ¦  2 ¦ B  ¦ a           ¦    ¦  4 ¦ a           ¦ ¦  3 ¦ C  ¦ a           ¦    +------------------+ ¦  4 ¦ A  ¦ d           ¦ +-----------------------+ The target table’s alternate key (ak) column is unique, for rows where status_code = ‘a’.  Applying the changes to the target will change row 1 from status ‘a’ to status ‘d’, and row 4 from status ‘d’ to status ‘a’.  The result of applying all the changes will still satisfy the filtered unique index, because the ‘A’ in row 1 will be deleted from the index and the ‘A’ in row 4 will be added. Merge Test One Let’s now execute a MERGE statement to apply the changes: MERGE #Target AS t USING #Changes AS c ON c.pk = t.pk WHEN MATCHED AND c.status_code <> t.status_code THEN UPDATE SET status_code = c.status_code; The MERGE changes the two target rows as expected.  The updated target table now contains: +-----------------------+ ¦ pk ¦ ak ¦ status_code ¦ ¦----+----+-------------¦ ¦  1 ¦ A  ¦ d           ¦ <—changed from ‘a’ ¦  2 ¦ B  ¦ a           ¦ ¦  3 ¦ C  ¦ a           ¦ ¦  4 ¦ A  ¦ a           ¦ <—changed from ‘d’ +-----------------------+ Merge Test Two Now let’s repopulate the changes table to reverse the updates we just performed: TRUNCATE TABLE #Changes;   INSERT #Changes (pk, status_code) VALUES (1, 'a'), (4, 'd'); This will change row 1 back to status ‘a’ and row 4 back to status ‘d’.  As a reminder, the current state of the tables is:          Target                        Changes +-----------------------+    +------------------+ ¦ pk ¦ ak ¦ status_code ¦    ¦ pk ¦ status_code ¦ ¦----+----+-------------¦    ¦----+-------------¦ ¦  1 ¦ A  ¦ d           ¦    ¦  1 ¦ a           ¦ ¦  2 ¦ B  ¦ a           ¦    ¦  4 ¦ d           ¦ ¦  3 ¦ C  ¦ a           ¦    +------------------+ ¦  4 ¦ A  ¦ a           ¦ +-----------------------+ We execute the same MERGE statement: MERGE #Target AS t USING #Changes AS c ON c.pk = t.pk WHEN MATCHED AND c.status_code <> t.status_code THEN UPDATE SET status_code = c.status_code; However this time we receive the following message: Msg 2601, Level 14, State 1, Line 1 Cannot insert duplicate key row in object 'dbo.#Target' with unique index 'uq1'. The duplicate key value is (A). The statement has been terminated. Applying the changes using UPDATE Let’s now rewrite the MERGE to use UPDATE instead: UPDATE t SET status_code = c.status_code FROM #Target AS t JOIN #Changes AS c ON t.pk = c.pk WHERE c.status_code <> t.status_code; This query succeeds where the MERGE failed.  The two rows are updated as expected: +-----------------------+ ¦ pk ¦ ak ¦ status_code ¦ ¦----+----+-------------¦ ¦  1 ¦ A  ¦ a           ¦ <—changed back to ‘a’ ¦  2 ¦ B  ¦ a           ¦ ¦  3 ¦ C  ¦ a           ¦ ¦  4 ¦ A  ¦ d           ¦ <—changed back to ‘d’ +-----------------------+ What went wrong with the MERGE? In this test, the MERGE query execution happens to apply the changes in the order of the ‘pk’ column. In test one, this was not a problem: row 1 is removed from the unique filtered index by changing status_code from ‘a’ to ‘d’ before row 4 is added.  At no point does the table contain two rows where ak = ‘A’ and status_code = ‘a’. In test two, however, the first change was to change row 1 from status ‘d’ to status ‘a’.  This change means there would be two rows in the filtered unique index where ak = ‘A’ (both row 1 and row 4 meet the index filtering criteria ‘status_code = a’). The storage engine does not allow the query processor to violate a unique key (unless IGNORE_DUP_KEY is ON, but that is a different story, and doesn’t apply to MERGE in any case).  This strict rule applies regardless of the fact that if all changes were applied, there would be no unique key violation (row 4 would eventually be changed from ‘a’ to ‘d’, removing it from the filtered unique index, and resolving the key violation). Why it went wrong The query optimizer usually detects when this sort of temporary uniqueness violation could occur, and builds a plan that avoids the issue.  I wrote about this a couple of years ago in my post Beware Sneaky Reads with Unique Indexes (you can read more about the details on pages 495-497 of Microsoft SQL Server 2008 Internals or in Craig Freedman’s blog post on maintaining unique indexes).  To summarize though, the optimizer introduces Split, Filter, Sort, and Collapse operators into the query plan to: Split each row update into delete followed by an inserts Filter out rows that would not change the index (due to the filter on the index, or a non-updating update) Sort the resulting stream by index key, with deletes before inserts Collapse delete/insert pairs on the same index key back into an update The effect of all this is that only net changes are applied to an index (as one or more insert, update, and/or delete operations).  In this case, the net effect is a single update of the filtered unique index: changing the row for ak = ‘A’ from pk = 4 to pk = 1.  In case that is less than 100% clear, let’s look at the operation in test two again:          Target                     Changes                   Result +-----------------------+    +------------------+    +-----------------------+ ¦ pk ¦ ak ¦ status_code ¦    ¦ pk ¦ status_code ¦    ¦ pk ¦ ak ¦ status_code ¦ ¦----+----+-------------¦    ¦----+-------------¦    ¦----+----+-------------¦ ¦  1 ¦ A  ¦ d           ¦    ¦  1 ¦ d           ¦    ¦  1 ¦ A  ¦ a           ¦ ¦  2 ¦ B  ¦ a           ¦    ¦  4 ¦ a           ¦    ¦  2 ¦ B  ¦ a           ¦ ¦  3 ¦ C  ¦ a           ¦    +------------------+    ¦  3 ¦ C  ¦ a           ¦ ¦  4 ¦ A  ¦ a           ¦                            ¦  4 ¦ A  ¦ d           ¦ +-----------------------+                            +-----------------------+ From the filtered index’s point of view (filtered for status_code = ‘a’ and shown in nonclustered index key order) the overall effect of the query is:   Before           After +---------+    +---------+ ¦ pk ¦ ak ¦    ¦ pk ¦ ak ¦ ¦----+----¦    ¦----+----¦ ¦  4 ¦ A  ¦    ¦  1 ¦ A  ¦ ¦  2 ¦ B  ¦    ¦  2 ¦ B  ¦ ¦  3 ¦ C  ¦    ¦  3 ¦ C  ¦ +---------+    +---------+ The single net change there is a change of pk from 4 to 1 for the nonclustered index entry ak = ‘A’.  This is the magic performed by the split, sort, and collapse.  Notice in particular how the original changes to the index key (on the ‘ak’ column) have been transformed into an update of a non-key column (pk is included in the nonclustered index).  By not updating any nonclustered index keys, we are guaranteed to avoid transient key violations. The Execution Plans The estimated MERGE execution plan that produces the incorrect key-violation error looks like this (click to enlarge in a new window): The successful UPDATE execution plan is (click to enlarge in a new window): The MERGE execution plan is a narrow (per-row) update.  The single Clustered Index Merge operator maintains both the clustered index and the filtered nonclustered index.  The UPDATE plan is a wide (per-index) update.  The clustered index is maintained first, then the Split, Filter, Sort, Collapse sequence is applied before the nonclustered index is separately maintained. There is always a wide update plan for any query that modifies the database. The narrow form is a performance optimization where the number of rows is expected to be relatively small, and is not available for all operations.  One of the operations that should disallow a narrow plan is maintaining a unique index where intermediate key violations could occur. Workarounds The MERGE can be made to work (producing a wide update plan with split, sort, and collapse) by: Adding all columns referenced in the filtered index’s WHERE clause to the index key (INCLUDE is not sufficient); or Executing the query with trace flag 8790 set e.g. OPTION (QUERYTRACEON 8790). Undocumented trace flag 8790 forces a wide update plan for any data-changing query (remember that a wide update plan is always possible).  Either change will produce a successfully-executing wide update plan for the MERGE that failed previously. Conclusion The optimizer fails to spot the possibility of transient unique key violations with MERGE under the conditions listed at the start of this post.  It incorrectly chooses a narrow plan for the MERGE, which cannot provide the protection of a split/sort/collapse sequence for the nonclustered index maintenance. The MERGE plan may fail at execution time depending on the order in which rows are processed, and the distribution of data in the database.  Worse, a previously solid MERGE query may suddenly start to fail unpredictably if a filtered unique index is added to the merge target table at any point. Connect bug filed here Tests performed on SQL Server 2012 SP1 CUI (build 11.0.3321) x64 Developer Edition © 2012 Paul White – All Rights Reserved Twitter: @SQL_Kiwi Email: [email protected]

    Read the article

  • The current state of a MERGE Destination for SSIS

    - by jamiet
    Hugo Tap asked me on Twitter earlier today whether or not there existed a SSIS Dataflow Destination component that enabled one to MERGE data into a table rather than INSERT it. Its a common request so I thought it might be useful to summarise the current state of play as regards a MERGE destination for SSIS. Firstly, there is no MERGE destination component in the box; that is, when you install SSIS no MERGE Destination will be available. That being said the SSIS team have made available a MERGE destination component via Codeplex which you can get from http://sqlsrvintegrationsrv.codeplex.com/releases/view/19048. I have never used it so cannot vouch for its usefulness although judging by some of the reviews you might not want to set your expectations too high. Your mileage may vary.   In the past it has occurred to me that a built-in way to provide MERGE from the SSIS pipeline would be highly valuable. I assume that this would have to be provided by the database into which you were merging hence in March 2010 I submitted the following two requests to Connect: BULK MERGE (111 votes at the time of writing) [SSIS] BULK MERGE Destination (15 votes) If you think these would be useful feel free to vote them up and add a comment. Lastly, this one is nothing to do with SSIS but if you want to perform a minimally logged MERGE using T-SQL Sunil Agarwal has explained how at Minimal logging and MERGE statement. @Jamiet

    Read the article

  • SQL Server 2005 RIGHT OUTER JOIN not working

    - by CheeseConQueso
    I'm looking up access logs for specific courses. I need to show all the courses even if they don't exist in the logs table. Hence the outer join.... but after trying (presumably) all of the variations of LEFT OUTER, RIGHT OUTER, INNER and placement of the tables within the SQL code, I couldn't get my result. Here's what I am running: SELECT (a.first_name+' '+a.last_name) instructor, c.course_id, COUNT(l.access_date) course_logins, a.logins system_logins, MAX(l.access_date) last_course_login, a.last_login last_system_login FROM lsn_logs l RIGHT OUTER JOIN courses c ON l.course_id = c.course_id, accounts a WHERE l.object_id = 'LOGIN' AND c.course_type = 'COURSE' AND c.course_id NOT LIKE '%TEST%' AND a.account_rights > 2 AND l.user_id = a.username AND ((a.first_name+' '+a.last_name) = c.instructor) GROUP BY c.course_id, a.first_name, a.last_name, a.last_login, a.logins, c.instructor ORDER BY a.last_name, a.first_name, c.course_id, course_logins DESC Is it something in the WHERE clause that's preventing me from getting course_id's that don't exist in lsn_logs? Is it the way I'm joining the tables? Again, in short, I want all course_id's regardless of their existence in lsn_logs.

    Read the article

  • Merge Join component sorted outputs [SSIS]

    - by jamiet
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet  P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

    Read the article

  • Merge Join component sorted outputs [SSIS]

    - by jamiet
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet  P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

    Read the article

  • Difference between EJB Persist & Merge operation

    - by shantala.sankeshwar
    This article gives the difference between EJB Persist & Merge operations with scenarios.Use Case Description Users working on EJB persist & merge operations often have this question in mind " When merge can create new entity as well as modify existing entity,then why do we have 2 separate operations - persist & merge?" The reason is very simple.If we use merge operation to create new entity & if the entity exists then it does not throw any exception,but persist throws exception if the entity already exists.Merge should be used to modify the existing entity.The sql statement that gets executed on persist operation is insert statement.But in case of merge first select statement gets executed & then update sql statement gets executed.Scenario 1: Persist operation to create new Emp recordLet us suppose that we have a Java EE Web Application created with Entities from Emp table & have created session bean with data control. Drop Emp Object(Expand SessionEJBLocal->Constructors under Data Controls) as ADF Parameter form in jspx pageDrop persistEmp(Emp) as ADF CommandButton & provide #{bindings.EmpIterator.currentRow.dataProvider} as the value for emp parameter.Then run this page & provide values for Emp,click on 'persistEmp' button.New Emp record gets created.So when we execute persist operation only insert sql statement gets executed :INSERT INTO EMP (EMPNO, COMM, HIREDATE, ENAME, JOB, DEPTNO, SAL, MGR) VALUES (?, ?, ?, ?, ?, ?, ?, ?)    bind => [2, null, null, e2, null, 10, null, null]Scenario 2: Merge operation to modify existing Emp recordLet us suppose that we have a Java EE Web Application created with Entities from Emp table & have created session bean with data control.Drop empFindAll() Object as ADF form on jspx page.Drop mergeEmp(Emp) operation as commandButton & provide #{bindings.EmpIterator.currentRow.dataProvider} as the value for emp parameter.Then run this page & modify values for Emp record,click on 'mergeEmp' button.The respective Emp record gets modified.So when we execute merge operation select & update sql statements gets executed :SELECT EMPNO, COMM, HIREDATE, ENAME, JOB, DEPTNO, SAL, MGR FROM EMP WHERE (EMPNO = ?) bind => [7566]UPDATE EMP SET ENAME = ? WHERE (EMPNO = ?) bind => [KINGS, 7839]

    Read the article

  • T-SQL (SCD) Slowly Changing Dimension Type 2 using a merge statement

    - by AtulThakor
    Working on stored procedure recently which loads records into a data warehouse I found that the existing record was being expired using an update statement followed by an insert to add the new active record. Playing around with the merge statement you can actually expire the current record and insert a new record within one clean statement. This is how the statement works, we do the normal merge statement to insert a record when there is no match, if we match the record we update the existing record by expiring it and deactivating. At the end of the merge statement we use the output statement to output the staging values for the update,  we wrap the whole merge statement within an insert statement and add new rows for the records which we inserted. I’ve added the full script at the bottom so you can paste it and play around.   1: INSERT INTO ExampleFactUpdate 2: (PolicyID, 3: Status) 4: SELECT -- these columns are returned from the output statement 5: PolicyID, 6: Status 7: FROM 8: ( 9: -- merge statement on unique id in this case Policy_ID 10: MERGE dbo.ExampleFactUpdate dp 11: USING dbo.ExampleStag s 12: ON dp.PolicyID = s.PolicyID 13: WHEN NOT MATCHED THEN -- when we cant match the record we insert a new record record and this is all that happens 14: INSERT (PolicyID,Status) 15: VALUES (s.PolicyID, s.Status) 16: WHEN MATCHED --if it already exists 17: AND ExpiryDate IS NULL -- and the Expiry Date is null 18: THEN 19: UPDATE 20: SET 21: dp.ExpiryDate = getdate(), --we set the expiry on the existing record 22: dp.Active = 0 -- and deactivate the existing record 23: OUTPUT $Action MergeAction, s.PolicyID, s.Status -- the output statement returns a merge action which can 24: ) MergeOutput -- be insert/update/delete, on our example where a record has been updated (or expired in our case 25: WHERE -- we'll filter using a where clause 26: MergeAction = 'Update'; -- here   Complete source for example 1: if OBJECT_ID('ExampleFactUpdate') > 0 2: drop table ExampleFactUpdate 3:  4: Create Table ExampleFactUpdate( 5: ID int identity(1,1), 3: go 6: PolicyID varchar(100), 7: Status varchar(100), 8: EffectiveDate datetime default getdate(), 9: ExpiryDate datetime, 10: Active bit default 1 11: ) 12:  13:  14: insert into ExampleFactUpdate( 15: PolicyID, 16: Status) 17: select 18: 1, 19: 'Live' 20:  21: /*Create Staging Table*/ 22: if OBJECT_ID('ExampleStag') > 0 23: drop table ExampleStag 24: go 25:  26: /*Create example fact table */ 27: Create Table ExampleStag( 28: PolicyID varchar(100), 29: Status varchar(100)) 30:  31: --add some data 32: insert into ExampleStag( 33: PolicyID, 34: Status) 35: select 36: 1, 37: 'Lapsed' 38: union all 39: select 40: 2, 41: 'Quote' 42:  43: select * 44: from ExampleFactUpdate 45:  46: select * 47: from ExampleStag 48:  49:  50: INSERT INTO ExampleFactUpdate 51: (PolicyID, 52: Status) 53: SELECT -- these columns are returned from the output statement 54: PolicyID, 55: Status 56: FROM 57: ( 58: -- merge statement on unique id in this case Policy_ID 59: MERGE dbo.ExampleFactUpdate dp 60: USING dbo.ExampleStag s 61: ON dp.PolicyID = s.PolicyID 62: WHEN NOT MATCHED THEN -- when we cant match the record we insert a new record record and this is all that happens 63: INSERT (PolicyID,Status) 64: VALUES (s.PolicyID, s.Status) 65: WHEN MATCHED --if it already exists 66: AND ExpiryDate IS NULL -- and the Expiry Date is null 67: THEN 68: UPDATE 69: SET 70: dp.ExpiryDate = getdate(), --we set the expiry on the existing record 71: dp.Active = 0 -- and deactivate the existing record 72: OUTPUT $Action MergeAction, s.PolicyID, s.Status -- the output statement returns a merge action which can 73: ) MergeOutput -- be insert/update/delete, on our example where a record has been updated (or expired in our case 74: WHERE -- we'll filter using a where clause 75: MergeAction = 'Update'; -- here 76:  77:  78: select * 79: from ExampleFactUpdate 80: 

    Read the article

  • How to join data frames in R (inner, outer, left, right)?

    - by Dan Goldstein
    Given two data frames df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) df2 = data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1))) > df1 CustomerId Product 1 Toaster 2 Toaster 3 Toaster 4 Radio 5 Radio 6 Radio > df2 CustomerId State 2 Alabama 4 Alabama 6 Ohio How can I do database style, i.e., sql style, joins? That is, how do I get: An inner join of df1 and df1 An outer join of df1 and df2 A left outer join of df1 and df2 A right outer join of df1 and df2 P.S. IKT-JARQ (I Know This - Just Adding R Questions) Extra credit: How can I do a sql style select statement?

    Read the article

  • TFS 2010 SDK: Smart Merge - Programmatically Create your own Merge Tool

    - by Tarun Arora
    Technorati Tags: Team Foundation Server 2010,TFS SDK,TFS API,TFS Merge Programmatically,TFS Work Items Programmatically,TFS Administration Console,ALM   The information available in the Merge window in Team Foundation Server 2010 is very important in the decision making during the merging process. However, at present the merge window shows very limited information, more that often you are interested to know the work item, files modified, code reviewer notes, policies overridden, etc associated with the change set. Our friends at Microsoft are working hard to change the game again with vNext, but because at present the merge window is a model window you have to cancel the merge process and go back one after the other to check the additional information you need. If you can relate to what i am saying, you will enjoy this blog post! I will show you how to programmatically create your own merging window using the TFS 2010 API. A few screen shots of the WPF TFS 2010 API – Custom Merging Application that we will be creating programmatically, Excited??? Let’s start coding… 1. Get All Team Project Collections for the TFS Server You can read more on connecting to TFS programmatically on my blog post => How to connect to TFS Programmatically 1: public static ReadOnlyCollection<CatalogNode> GetAllTeamProjectCollections() 2: { 3: TfsConfigurationServer configurationServer = 4: TfsConfigurationServerFactory. 5: GetConfigurationServer(new Uri("http://xxx:8080/tfs/")); 6: 7: CatalogNode catalogNode = configurationServer.CatalogNode; 8: return catalogNode.QueryChildren(new Guid[] 9: { CatalogResourceTypes.ProjectCollection }, 10: false, CatalogQueryOptions.None); 11: } 2. Get All Team Projects for the selected Team Project Collection You can read more on connecting to TFS programmatically on my blog post => How to connect to TFS Programmatically 1: public static ReadOnlyCollection<CatalogNode> GetTeamProjects(string instanceId) 2: { 3: ReadOnlyCollection<CatalogNode> teamProjects = null; 4: 5: TfsConfigurationServer configurationServer = 6: TfsConfigurationServerFactory.GetConfigurationServer(new Uri("http://xxx:8080/tfs/")); 7: 8: CatalogNode catalogNode = configurationServer.CatalogNode; 9: var teamProjectCollections = catalogNode.QueryChildren(new Guid[] {CatalogResourceTypes.ProjectCollection }, 10: false, CatalogQueryOptions.None); 11: 12: foreach (var teamProjectCollection in teamProjectCollections) 13: { 14: if (string.Compare(teamProjectCollection.Resource.Properties["InstanceId"], instanceId, true) == 0) 15: { 16: teamProjects = teamProjectCollection.QueryChildren(new Guid[] { CatalogResourceTypes.TeamProject }, false, 17: CatalogQueryOptions.None); 18: } 19: } 20: 21: return teamProjects; 22: } 3. Get All Branches with in a Team Project programmatically I will be passing the name of the Team Project for which i want to retrieve all the branches. When consuming the ‘Version Control Service’ you have the method QueryRootBranchObjects, you need to pass the recursion type => none, one, full. Full implies you are interested in all branches under that root branch. 1: public static List<BranchObject> GetParentBranch(string projectName) 2: { 3: var branches = new List<BranchObject>(); 4: 5: var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://<ServerName>:8080/tfs/<teamProjectName>")); 6: var versionControl = tfs.GetService<VersionControlServer>(); 7: 8: var allBranches = versionControl.QueryRootBranchObjects(RecursionType.Full); 9: 10: foreach (var branchObject in allBranches) 11: { 12: if (branchObject.Properties.RootItem.Item.ToUpper().Contains(projectName.ToUpper())) 13: { 14: branches.Add(branchObject); 15: } 16: } 17: 18: return branches; 19: } 4. Get All Branches associated to the Parent Branch Programmatically Now that we have the parent branch, it is important to retrieve all child branches of that parent branch. Lets see how we can achieve this using the TFS API. 1: public static List<ItemIdentifier> GetChildBranch(string parentBranch) 2: { 3: var branches = new List<ItemIdentifier>(); 4: 5: var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://<ServerName>:8080/tfs/<CollectionName>")); 6: var versionControl = tfs.GetService<VersionControlServer>(); 7: 8: var i = new ItemIdentifier(parentBranch); 9: var allBranches = 10: versionControl.QueryBranchObjects(i, RecursionType.None); 11: 12: foreach (var branchObject in allBranches) 13: { 14: foreach (var childBranche in branchObject.ChildBranches) 15: { 16: branches.Add(childBranche); 17: } 18: } 19: 20: return branches; 21: } 5. Get Merge candidates between two branches Programmatically Now that we have the parent and the child branch that we are interested to perform a merge between we will use the method ‘GetMergeCandidates’ in the namespace ‘Microsoft.TeamFoundation.VersionControl.Client’ => http://msdn.microsoft.com/en-us/library/bb138934(v=VS.100).aspx 1: public static MergeCandidate[] GetMergeCandidates(string fromBranch, string toBranch) 2: { 3: var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://<ServerName>:8080/tfs/<CollectionName>")); 4: var versionControl = tfs.GetService<VersionControlServer>(); 5: 6: return versionControl.GetMergeCandidates(fromBranch, toBranch, RecursionType.Full); 7: } 6. Get changeset details Programatically Now that we have the changeset id that we are interested in, we can get details of the changeset. The Changeset object contains the properties => http://msdn.microsoft.com/en-us/library/microsoft.teamfoundation.versioncontrol.client.changeset.aspx - Changes: Gets or sets an array of Change objects that comprise this changeset. - CheckinNote: Gets or sets the check-in note of the changeset. - Comment: Gets or sets the comment of the changeset. - PolicyOverride: Gets or sets the policy override information of this changeset. - WorkItems: Gets an array of work items that are associated with this changeset. 1: public static Changeset GetChangeSetDetails(int changeSetId) 2: { 3: var tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri("http://<ServerName>:8080/tfs/<CollectionName>")); 4: var versionControl = tfs.GetService<VersionControlServer>(); 5: 6: return versionControl.GetChangeset(changeSetId); 7: } 7. Possibilities In future posts i will try and extend this idea to explore further possibilities, but few features that i am sure will further help during the merge decision making process would be, - View changed files - Compare modified file with current/previous version - Merge Preview - Last Merge date Any other features that you can think of?

    Read the article

  • Basics of Join Predicate Pushdown in Oracle

    - by Maria Colgan
    Happy New Year to all of our readers! We hope you all had a great holiday season. We start the new year by continuing our series on Optimizer transformations. This time it is the turn of Predicate Pushdown. I would like to thank Rafi Ahmed for the content of this blog.Normally, a view cannot be joined with an index-based nested loop (i.e., index access) join, since a view, in contrast with a base table, does not have an index defined on it. A view can only be joined with other tables using three methods: hash, nested loop, and sort-merge joins. Introduction The join predicate pushdown (JPPD) transformation allows a view to be joined with index-based nested-loop join method, which may provide a more optimal alternative. In the join predicate pushdown transformation, the view remains a separate query block, but it contains the join predicate, which is pushed down from its containing query block into the view. The view thus becomes correlated and must be evaluated for each row of the outer query block. These pushed-down join predicates, once inside the view, open up new index access paths on the base tables inside the view; this allows the view to be joined with index-based nested-loop join method, thereby enabling the optimizer to select an efficient execution plan. The join predicate pushdown transformation is not always optimal. The join predicate pushed-down view becomes correlated and it must be evaluated for each outer row; if there is a large number of outer rows, the cost of evaluating the view multiple times may make the nested-loop join suboptimal, and therefore joining the view with hash or sort-merge join method may be more efficient. The decision whether to push down join predicates into a view is determined by evaluating the costs of the outer query with and without the join predicate pushdown transformation under Oracle's cost-based query transformation framework. The join predicate pushdown transformation applies to both non-mergeable views and mergeable views and to pre-defined and inline views as well as to views generated internally by the optimizer during various transformations. The following shows the types of views on which join predicate pushdown is currently supported. UNION ALL/UNION view Outer-joined view Anti-joined view Semi-joined view DISTINCT view GROUP-BY view Examples Consider query A, which has an outer-joined view V. The view cannot be merged, as it contains two tables, and the join between these two tables must be performed before the join between the view and the outer table T4. A: SELECT T4.unique1, V.unique3 FROM T_4K T4,            (SELECT T10.unique3, T10.hundred, T10.ten             FROM T_5K T5, T_10K T10             WHERE T5.unique3 = T10.unique3) VWHERE T4.unique3 = V.hundred(+) AND       T4.ten = V.ten(+) AND       T4.thousand = 5; The following shows the non-default plan for query A generated by disabling join predicate pushdown. When query A undergoes join predicate pushdown, it yields query B. Note that query B is expressed in a non-standard SQL and shows an internal representation of the query. B: SELECT T4.unique1, V.unique3 FROM T_4K T4,           (SELECT T10.unique3, T10.hundred, T10.ten             FROM T_5K T5, T_10K T10             WHERE T5.unique3 = T10.unique3             AND T4.unique3 = V.hundred(+)             AND T4.ten = V.ten(+)) V WHERE T4.thousand = 5; The execution plan for query B is shown below. In the execution plan BX, note the keyword 'VIEW PUSHED PREDICATE' indicates that the view has undergone the join predicate pushdown transformation. The join predicates (shown here in red) have been moved into the view V; these join predicates open up index access paths thereby enabling index-based nested-loop join of the view. With join predicate pushdown, the cost of query A has come down from 62 to 32.  As mentioned earlier, the join predicate pushdown transformation is cost-based, and a join predicate pushed-down plan is selected only when it reduces the overall cost. Consider another example of a query C, which contains a view with the UNION ALL set operator.C: SELECT R.unique1, V.unique3 FROM T_5K R,            (SELECT T1.unique3, T2.unique1+T1.unique1             FROM T_5K T1, T_10K T2             WHERE T1.unique1 = T2.unique1             UNION ALL             SELECT T1.unique3, T2.unique2             FROM G_4K T1, T_10K T2             WHERE T1.unique1 = T2.unique1) V WHERE R.unique3 = V.unique3 and R.thousand < 1; The execution plan of query C is shown below. In the above, 'VIEW UNION ALL PUSHED PREDICATE' indicates that the UNION ALL view has undergone the join predicate pushdown transformation. As can be seen, here the join predicate has been replicated and pushed inside every branch of the UNION ALL view. The join predicates (shown here in red) open up index access paths thereby enabling index-based nested loop join of the view. Consider query D as an example of join predicate pushdown into a distinct view. We have the following cardinalities of the tables involved in query D: Sales (1,016,271), Customers (50,000), and Costs (787,766).  D: SELECT C.cust_last_name, C.cust_city FROM customers C,            (SELECT DISTINCT S.cust_id             FROM sales S, costs CT             WHERE S.prod_id = CT.prod_id and CT.unit_price > 70) V WHERE C.cust_state_province = 'CA' and C.cust_id = V.cust_id; The execution plan of query D is shown below. As shown in XD, when query D undergoes join predicate pushdown transformation, the expensive DISTINCT operator is removed and the join is converted into a semi-join; this is possible, since all the SELECT list items of the view participate in an equi-join with the outer tables. Under similar conditions, when a group-by view undergoes join predicate pushdown transformation, the expensive group-by operator can also be removed. With the join predicate pushdown transformation, the elapsed time of query D came down from 63 seconds to 5 seconds. Since distinct and group-by views are mergeable views, the cost-based transformation framework also compares the cost of merging the view with that of join predicate pushdown in selecting the most optimal execution plan. Summary We have tried to illustrate the basic ideas behind join predicate pushdown on different types of views by showing example queries that are quite simple. Oracle can handle far more complex queries and other types of views not shown here in the examples. Again many thanks to Rafi Ahmed for the content of this blog post.

    Read the article

  • SQL SERVER- Differences Between Left Join and Left Outer Join

    - by pinaldave
    There are a few questions that I had decided not to discuss on this blog because I think they are very simple and many of us know it. Many times, I even receive not-so positive notes from several readers when I am writing something simple. However, assuming that we know all and beginners should know everything is not the right attitude. Since day 1, I have been keeping a small journal regarding questions that I receive in this blog. There are around 200+ questions I receive every day through emails, comments and occasional phone calls. Yesterday, I received a comment with the following question: What are the differences between Left Join and Left Outer Join? Click here to read original comment. This question has triggered the threshold of receiving the same question repeatedly. Here is the answer: There is absolutely no difference between LEFT JOIN and LEFT OUTER JOIN. The same is true for RIGHT JOIN and RIGHT OUTER JOIN. When you use LEFT JOIN keyword in SQL Server, it means LEFT OUTER JOIN only. I have already written in-depth visual diagram discussing the JOINs. I encourage all of you to read the article for further understanding of the JOINs: Read Introduction to JOINs – Basic of JOINs Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Joins, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Joining on a common table, how do you get a FULL OUTER JOIN to expand on another table?

    - by stimpy77
    I've scoured StackOverflow and Google for an answer to this problem. I'm trying to create a Microsot SQL Server 2008 view. Not a stored procedure. Not a function. Just a query (i.e. a view). I have three tables. The first table defines a common key, let's say "CompanyID". The other two tables have a sometimes-common field, let's say "EmployeeName". I want a single table result that, when my WHERE clause says "WHERE CompanyID = 12" looks like this: CompanyID | TableA | TableB 12 | John Doe | John Doe 12 | Betty Sue | NULL 12 | NULL | Billy Bob I've tried a FULL OUTER JOIN that looks like this: SELECT Company.CompanyID, TableA.EmployeeName, TableB.EmployeeName FROM Company FULL OUTER JOIN TableA ON Company.CompanyID = TableA.CompanyID FULL OUTER JOIN TableB ON Company.CompanyID = TableB.CompanyID AND (TableA.EmployeeName IS NULL OR TableB.EmployeeName IS NULL OR TableB.EmployeeName = TableA.EmployeeName) I'm only getting the NULL from one matched table, I'm not getting the expansion for the other table. In the above sample, I'm basically only getting the first and third rows and not the second. Can someone help me create this query and show me how this is done correctly? BTW I already have a stored procedure that looks very clean and populates an in-memory table, but that isn't what I want. Thanks.

    Read the article

  • Detecting branch reintegration or merge in pre-commit script

    - by Shawn Chin
    Within a pre-commit script, is it possible (and if so, how) to identify commits stemming from an svn merge? svnlook changed ... shows files that have changed, but does not differentiate between merges and manual edits. Ideally, I would also like to differentiate between a standard merge and a merge --reintegrate. Background: I'm exploring the possibility of using pre-commit hooks to enforce SVN usage policies for our project. One of the policies state that some directories (such as /trunk) should not be modified directly, and changed only through the reintegration of feature branches. The pre-commit script would therefore reject all changes made to these directories apart from branch reintegrations. Any ideas? Update: I've explored the svnlook command, and the closest I've got is to detect and parse changes to the svn:mergeinfo property of the directory. This approach has some drawback: svnlook can flag up a change in properties, but not which property was changed. (a diff with the proplist of the previous revision is required) By inspecting changes in svn:mergeinfo, it is possible to detect that svn merge was run. However, there is no way to determine if the commits are purely a result of the merge. Changes manually made after the merge will go undetected. (related post: Diff transaction tree against another path/revision)

    Read the article

  • Most optimal order (of joins) for left join

    - by Ram
    I have 3 tables Table1 (with 1020690 records), Table2(with 289425 records), Table 3(with 83692 records).I have something like this SELECT * FROM Table1 T1 /* OK fine select * is bad when not all columns are needed, this is just an example*/ LEFT JOIN Table2 T2 ON T1.id=T2.id LEFT JOIN Table3 T3 ON T1.id=T3.id and a query like this SELECT * FROM Table1 T1 LEFT JOIN Table3 T3 ON T1.id=T3.id LEFT JOIN Table2 T2 ON T1.id=T2.id The query plan shows me that it uses 2 Merge Join for both the joins. For the first query, the first merge is with T1 and T2 and then with T3. For the second query, the first merge is with T1 and T3 and then with T2. Both these queries take about the same time(40 seconds approx.) or sometimes Query1 takes couple of seconds longer. So my question is, does the join order matter ?

    Read the article

  • LINQ to SQL - Left Outer Join with multiple join conditions

    - by dan
    I have the following SQL which I am trying to translate to LINQ: SELECT f.value FROM period as p LEFT OUTER JOIN facts AS f ON p.id = f.periodid AND f.otherid = 17 WHERE p.companyid = 100 I have seen the typical implementation of the left outer join (ie. into x from y in x.DefaultIfEmpty() etc.) but am unsure how to introduce the other join condition ('AND f.otherid = 17') EDIT Why is the 'AND f.otherid = 17' condition part of the JOIN instead of in the WHERE clause? Because f may not exist for some rows and I still want these rows to be included. If the condition is applied in the WHERE clause, after the JOIN - then I don't get the behaviour I want. Unfortunately this: from p in context.Periods join f in context.Facts on p.id equals f.periodid into fg from fgi in fg.DefaultIfEmpty() where p.companyid == 100 && fgi.otherid == 17 select f.value seems to be equivalent to this: SELECT f.value FROM period as p LEFT OUTER JOIN facts AS f ON p.id = f.periodid WHERE p.companyid = 100 && AND f.otherid = 17 which is not quite what I'm after.

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >