statistics - Page 18 - Developer IT

How can I get the current network interface throughput statistics on Linux/UNIX?

- by DavidM

Tools such as MRTG provide network throughput / bandwidth graphs for the current network utilisation on specific interfaces, such as eth0. How can I return that information at the command line on Linux/UNIX? Preferably this would be without installing anything other than what is available on the system as standard.

Read the article

Using routing (?) to make GET request in Rails so ID doesn't show

- by ale

I have a link in one view myapp/locations that goes to myapp/statistics?id=1 (statistics for location with ID 1) which works fine but it doesn't look pretty. I think I've seen people do this sort of thing without needing the ?id=1? I could use a POST but this is not RESTful. Is there a way I can use routing to allow the user to go to myapp/statistics?id=1 but have the user see myapp/statistics? Many thanks.

Read the article

In Wireshark's Protocol Hierarchy Statistics screen, is the total byte count of a capture the sum of the Bytes column or just the top line (Frame)?

- by Howiecamp

Part 1 - I'm looking at Wireshark's Protocol Hierarchy Statistics screen (sample below), is the total byte count of the capture the sum of the Bytes column or just the top line (Frame)? I'm 99% that it's the latter because of protocol rollup but I wanted to conform. Part 2 - From Wireshark documentation on this screen, "Protocol layers can consist of packets that won't contain any higher layer protocol, so the sum of all higher layer packets may not sum up to the protocols packet count. Example: In the screenshot TCP has 85,83% but the sum of the subprotocols (HTTP, ...) is much less. This may be caused by TCP protocol overhead, e.g. TCP ACK packets won't be counted as packets of the higher layer)." Can you explain this?

Read the article

Database Design: A proper table design for large number of column values.

- by Jake

I wish to perform an experiment many different times. After every trial, I am left with a "large" set of output statistics -- let's say, 1000. I would like to store the outputs of my experiments in a table, but what's the best way...? Option 1 Have a table with 1000 columns. Seems like a bad idea. What if the number of statistics one day exceeds the maximum number of columns? Option 2 Have a table with three columns. Let's say, ID, StatisticType, and StatisticValue. That way, you can have as many statistics as you want. However, reading a single experiments statistics becomes more complicated. Moreover, what if different statistics are different data types?? Any suggestions?

Read the article

postgreSQL - pg_class question

- by Sachin Chourasiya

PostgreSQL stores statistics about tables in the system table called pg_class. The query planner accesses this table for every query. These statistics may only be updated using the analyze command. If the analyze command is not run often, the statistics in this table may not be accurate and the query planner may make poor decisions which can degrade system performance. Another strategy is for the query planner to generate these statistics for each query (including selects, inserts, updates, and deletes). This approach would allow the query planner to have the most up-to-date statistics possible. Why postgres always rely on pg_class instead?

Read the article

postgres SQL - pg_class question

- by Sachin Chourasiya

PostgreSQL stores statistics about tables in the system table called pg_class. The query planner accesses this table for every query. These statistics may only be updated using the analyze command. If the analyze command is not run often, the statistics in this table may not be accurate and the query planner may make poor decisions which can degrade system performance. Another strategy is for the query planner to generate these statistics for each query (including selects, inserts, updates, and deletes). This approach would allow the query planner to have the most up-to-date statistics possible. Why postgres always rely on pg_class instead?

Read the article

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article

SQL SERVER – Three Puzzling Questions – Need Your Answer

- by pinaldave

Last week I had asked three questions on my blog. I got very good response to the questions. I am planning to write summary post for each of three questions next week. Before I write summary post and give credit to all the valid answers. I was wondering if I can bring to notice of all of you this week. Why SELECT * throws an error but SELECT COUNT(*) does not This is indeed very interesting question as not quite many realize that this kind of behavior SQL Server demonstrates out of the box. Once you run both the code and read the explanation it totally makes sense why SQL Server is behaving how it is behaving. Also there is connect item is associated with it. Also read the very first comment by Rob Farley it also shares very interesting detail. Statistics are not Updated but are Created Once This puzzle has multiple right answer. I am glad to see many of the correct answer as a comment to this blog post. Statistics are very important and it really helps SQL Server Engine to come up with optimal execution plan. DBA quite often ignore statistics thinking it does not need to be updated, as they are automatically maintained if proper database setting is configured (auto update and auto create). Well, in this question, we have scenario even though auto create and auto update statistics are ON, statistics is not updated. There are multiple solutions but what will be your solution in this case? When to use Function and When to use Stored Procedure This question is rather open ended question – there is no right or wrong answer. Everybody developer has always used functions and stored procedures. Here is the chance to justify when to use Stored Procedure and when to use Functions. I want to acknowledge that they can be used interchangeably but there are few reasons when one should not do that. There are few reasons when one is better than other. Let us discuss this here. Your opinion matters. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Performance, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLAuthority News, SQLServer, T SQL, Technology

Read the article

What algorithms can I use to detect if articles or posts are duplicates?

- by michael

I'm trying to detect if an article or forum post is a duplicate entry within the database. I've given this some thought, coming to the conclusion that someone who duplicate content will do so using one of the three (in descending difficult to detect): simple copy paste the whole text copy and paste parts of text merging it with their own copy an article from an external site and masquerade as their own Prepping Text For Analysis Basically any anomalies; the goal is to make the text as "pure" as possible. For more accurate results, the text is "standardized" by: Stripping duplicate white spaces and trimming leading and trailing. Newlines are standardized to \n. HTML tags are removed. Using a RegEx called Daring Fireball URLs are stripped. I use BB code in my application so that goes to. (ä)ccented and foreign (besides Enlgish) are converted to their non foreign form. I store information about each article in (1) statistics table and in (2) keywords table. (1) Statistics Table The following statistics are stored about the textual content (much like this post) text length letter count word count sentence count average words per sentence automated readability index gunning fog score For European languages Coleman-Liau and Automated Readability Index should be used as they do not use syllable counting, so should produce a reasonably accurate score. (2) Keywords Table The keywords are generated by excluding a huge list of stop words (common words), e.g., 'the', 'a', 'of', 'to', etc, etc. Sample Data text_length, 3963 letter_count, 3052 word_count, 684 sentence_count, 33 word_per_sentence, 21 gunning_fog, 11.5 auto_read_index, 9.9 keyword 1, killed keyword 2, officers keyword 3, police It should be noted that once an article gets updated all of the above statistics are regenerated and could be completely different values. How could I use the above information to detect if an article that's being published for the first time, is already existing within the database? I'm aware anything I'll design will not be perfect, the biggest risk being (1) Content that is not a duplicate will be flagged as duplicate (2) The system allows the duplicate content through. So the algorithm should generate a risk assessment number from 0 being no duplicate risk 5 being possible duplicate and 10 being duplicate. Anything above 5 then there's a good possibility that the content is duplicate. In this case the content could be flagged and linked to the article's that are possible duplicates and a human could decide whether to delete or allow. As I said before I'm storing keywords for the whole article, however I wonder if I could do the same on paragraph basis; this would also mean further separating my data in the DB but it would also make it easier for detecting (2) in my initial post. I'm thinking weighted average between the statistics, but in what order and what would be the consequences...

Read the article

I/O intensive MySql server on Amazon AWS

- by rhossi

We recently moved from a traditional Data Center to cloud computing on AWS. We are developing a product in partnership with another company, and we need to create a database server for the product we'll release. I have been using Amazon Web Services for the past 3 years, but this is the first time I received a spec with this very specific hardware configuration. I know there are trade-offs and that real hardware will always be faster than virtual machines, and knowing that fact forehand, what would you recommend? 1) Amazon EC2? 2) Amazon RDS? 3) Something else? 4) Forget it baby, stick to the real hardware Here is the hardware requirements This server will be focused on I/O and MySQL for the statistics, memory size and disk space for the images hosting. Server 1 I/O The very main part on this server will be I/O processing, FusionIO cards have proven themselves extremely efficient, this is currently the best you can have in this domain. o Fusion ioDrive2 MLC 365GB (http://www.fusionio.com/load/-media-/1m66wu/docsLibrary/FIO_ioDrive2_Datasheet.pdf) CPU MySQL will use less CPU cores than Apache but it will use them very hard, the E7 family has 30M Cache L3 wichi provide boost performance : o 1x Intel E7-2870 will be ok. Storage SAS will be good enough in terms of performance, especially considering the space required. o RAID 10 of 4 x SAS 10k or 15k for a total available space of 512 GB. Memory o 64 GB minimum is required on this server considering the size of the statistics database. Warning: the statistics database will grow quickly, if possible consider starting with 128 GB directly, it will help. This server will be focused on I/O and MySQL for the statistics, memory size and disk space for the images hosting. Server 2 I/O The very main part on this server will be I/O processing, FusionIO cards have proven themselves extremely efficient, this is currently the best you can have in this domain. o Fusion ioDrive2 MLC 365GB (http://www.fusionio.com/load/-media-/1m66wu/docsLibrary/FIO_ioDrive2_Datasheet.pdf) CPU MySQL will use less CPU cores than Apache but it will use them very hard, the E7 family has 30M Cache L3 wichi provide boost performance : o 1x Intel E7-2870 will be ok. Storage SAS will be good enough in terms of performance, especially considering the space required. o RAID 10 of 4 x SAS 10k or 15k for a total available space of 512 GB. Memory o 64 GB minimum is required on this server considering the size of the statistics database. Warning: the statistics database will grow quickly, if possible consider starting with 128 GB directly, it will help. Thanks in advance. Best,

Read the article

SQL SERVER – Guest Posts – Feodor Georgiev – The Context of Our Database Environment – Going Beyond the Internal SQL Server Waits – Wait Type – Day 21 of 28

- by pinaldave

This guest post is submitted by Feodor. Feodor Georgiev is a SQL Server database specialist with extensive experience of thinking both within and outside the box. He has wide experience of different systems and solutions in the fields of architecture, scalability, performance, etc. Feodor has experience with SQL Server 2000 and later versions, and is certified in SQL Server 2008. In this article Feodor explains the server-client-server process, and concentrated on the mutual waits between client and SQL Server. This is essential in grasping the concept of waits in a ‘global’ application plan. Recently I was asked to write a blog post about the wait statistics in SQL Server and since I had been thinking about writing it for quite some time now, here it is. It is a wide-spread idea that the wait statistics in SQL Server will tell you everything about your performance. Well, almost. Or should I say – barely. The reason for this is that SQL Server is always a part of a bigger system – there are always other players in the game: whether it is a client application, web service, any other kind of data import/export process and so on. In short, the SQL Server surroundings look like this: This means that SQL Server, aside from its internal waits, also depends on external waits and settings. As we can see in the picture above, SQL Server needs to have an interface in order to communicate with the surrounding clients over the network. For this communication, SQL Server uses protocol interfaces. I will not go into detail about which protocols are best, but you can read this article. Also, review the information about the TDS (Tabular data stream). As we all know, our system is only as fast as its slowest component. This means that when we look at our environment as a whole, the SQL Server might be a victim of external pressure, no matter how well we have tuned our database server performance. Let’s dive into an example: let’s say that we have a web server, hosting a web application which is using data from our SQL Server, hosted on another server. The network card of the web server for some reason is malfunctioning (think of a hardware failure, driver failure, or just improper setup) and does not send/receive data faster than 10Mbs. On the other end, our SQL Server will not be able to send/receive data at a faster rate either. This means that the application users will notify the support team and will say: “My data is coming very slow.” Now, let’s move on to a bit more exciting example: imagine that there is a similar setup as the example above – one web server and one database server, and the application is not using any stored procedure calls, but instead for every user request the application is sending 80kb query over the network to the SQL Server. (I really thought this does not happen in real life until I saw it one day.) So, what happens in this case? To make things worse, let’s say that the 80kb query text is submitted from the application to the SQL Server at least 100 times per minute, and as often as 300 times per minute in peak times. Here is what happens: in order for this query to reach the SQL Server, it will have to be broken into a of number network packets (according to the packet size settings) – and will travel over the network. On the other side, our SQL Server network card will receive the packets, will pass them to our network layer, the packets will get assembled, and eventually SQL Server will start processing the query – parsing, allegorizing, generating the query execution plan and so on. So far, we have already had a serious network overhead by waiting for the packets to reach our Database Engine. There will certainly be some processing overhead – until the database engine deals with the 80kb query and its 20 subqueries. The waits you see in the DMVs are actually collected from the point the query reaches the SQL Server and the packets are assembled. Let’s say that our query is processed and it finally returns 15000 rows. These rows have a certain size as well, depending on the data types returned. This means that the data will have converted to packages (depending on the network size package settings) and will have to reach the application server. There will also be waits, however, this time you will be able to see a wait type in the DMVs called ASYNC_NETWORK_IO. What this wait type indicates is that the client is not consuming the data fast enough and the network buffers are filling up. Recently Pinal Dave posted a blog on Client Statistics. What Client Statistics does is captures the physical flow characteristics of the query between the client(Management Studio, in this case) and the server and back to the client. As you see in the image, there are three categories: Query Profile Statistics, Network Statistics and Time Statistics. Number of server roundtrips–a roundtrip consists of a request sent to the server and a reply from the server to the client. For example, if your query has three select statements, and they are separated by ‘GO’ command, then there will be three different roundtrips. TDS Packets sent from the client – TDS (tabular data stream) is the language which SQL Server speaks, and in order for applications to communicate with SQL Server, they need to pack the requests in TDS packets. TDS Packets sent from the client is the number of packets sent from the client; in case the request is large, then it may need more buffers, and eventually might even need more server roundtrips. TDS packets received from server –is the TDS packets sent by the server to the client during the query execution. Bytes sent from client – is the volume of the data set to our SQL Server, measured in bytes; i.e. how big of a query we have sent to the SQL Server. This is why it is best to use stored procedures, since the reusable code (which already exists as an object in the SQL Server) will only be called as a name of procedure + parameters, and this will minimize the network pressure. Bytes received from server – is the amount of data the SQL Server has sent to the client, measured in bytes. Depending on the number of rows and the datatypes involved, this number will vary. But still, think about the network load when you request data from SQL Server. Client processing time – is the amount of time spent in milliseconds between the first received response packet and the last received response packet by the client. Wait time on server replies – is the time in milliseconds between the last request packet which left the client and the first response packet which came back from the server to the client. Total execution time – is the sum of client processing time and wait time on server replies (the SQL Server internal processing time) Here is an illustration of the Client-server communication model which should help you understand the mutual waits in a client-server environment. Keep in mind that a query with a large ‘wait time on server replies’ means the server took a long time to produce the very first row. This is usual on queries that have operators that need the entire sub-query to evaluate before they proceed (for example, sort and top operators). However, a query with a very short ‘wait time on server replies’ means that the query was able to return the first row fast. However a long ‘client processing time’ does not necessarily imply the client spent a lot of time processing and the server was blocked waiting on the client. It can simply mean that the server continued to return rows from the result and this is how long it took until the very last row was returned. The bottom line is that developers and DBAs should work together and think carefully of the resource utilization in the client-server environment. From experience I can say that so far I have seen only cases when the application developers and the Database developers are on their own and do not ask questions about the other party’s world. I would recommend using the Client Statistics tool during new development to track the performance of the queries, and also to find a synchronous way of utilizing resources between the client – server – client. Here is another example: think about similar setup as above, but add another server to the game. Let’s say that we keep our media on a separate server, and together with the data from our SQL Server we need to display some images on the webpage requested by our user. No matter how simple or complicated the logic to get the images is, if the images are 500kb each our users will get the page slowly and they will still think that there is something wrong with our data. Anyway, I don’t mean to get carried away too far from SQL Server. Instead, what I would like to say is that DBAs should also be aware of ‘the big picture’. I wrote a blog post a while back on this topic, and if you are interested, you can read it here about the big picture. And finally, here are some guidelines for monitoring the network performance and improving it: Run a trace and outline all queries that return more than 1000 rows (in Profiler you can actually filter and sort the captured trace by number of returned rows). This is not a set number; it is more of a guideline. The general thought is that no application user can consume that many rows at once. Ask yourself and your fellow-developers: ‘why?’. Monitor your network counters in Perfmon: Network Interface:Output queue length, Redirector:Network errors/sec, TCPv4: Segments retransmitted/sec and so on. Make sure to establish a good friendship with your network administrator (buy them coffee, for example J ) and get into a conversation about the network settings. Have them explain to you how the network cards are setup – are they standalone, are they ‘teamed’, what are the settings – full duplex and so on. Find some time to read a bit about networking. In this short blog post I hope I have turned your attention to ‘the big picture’ and the fact that there are other factors affecting our SQL Server, aside from its internal workings. As a further reading I would still highly recommend the Wait Stats series on this blog, also I would recommend you have the coffee break conversation with your network admin as soon as possible. This guest post is written by Feodor Georgiev. Read all the post in the Wait Types and Queue series. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL

Read the article

SQL SERVER – Get All the Information of Database using sys.databases

- by pinaldave

Earlier I wrote blog article SQL SERVER – Finding Last Backup Time for All Database. In the response of this article I have received very interesting script from SQL Server Expert Matteo as a comment in the blog. He has written script using sys.databases which provides plenty of the information about database. I suggest you can run this on your database and know unknown of your databases as well. SELECT database_id, CONVERT(VARCHAR(25), DB.name) AS dbName, CONVERT(VARCHAR(10), DATABASEPROPERTYEX(name, 'status')) AS [Status], state_desc, (SELECT COUNT(1) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'rows') AS DataFiles, (SELECT SUM((size*8)/1024) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'rows') AS [Data MB], (SELECT COUNT(1) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'log') AS LogFiles, (SELECT SUM((size*8)/1024) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'log') AS [Log MB], user_access_desc AS [User access], recovery_model_desc AS [Recovery model], CASE compatibility_level WHEN 60 THEN '60 (SQL Server 6.0)' WHEN 65 THEN '65 (SQL Server 6.5)' WHEN 70 THEN '70 (SQL Server 7.0)' WHEN 80 THEN '80 (SQL Server 2000)' WHEN 90 THEN '90 (SQL Server 2005)' WHEN 100 THEN '100 (SQL Server 2008)' END AS [compatibility level], CONVERT(VARCHAR(20), create_date, 103) + ' ' + CONVERT(VARCHAR(20), create_date, 108) AS [Creation date], -- last backup ISNULL((SELECT TOP 1 CASE TYPE WHEN 'D' THEN 'Full' WHEN 'I' THEN 'Differential' WHEN 'L' THEN 'Transaction log' END + ' – ' + LTRIM(ISNULL(STR(ABS(DATEDIFF(DAY, GETDATE(),Backup_finish_date))) + ' days ago', 'NEVER')) + ' – ' + CONVERT(VARCHAR(20), backup_start_date, 103) + ' ' + CONVERT(VARCHAR(20), backup_start_date, 108) + ' – ' + CONVERT(VARCHAR(20), backup_finish_date, 103) + ' ' + CONVERT(VARCHAR(20), backup_finish_date, 108) + ' (' + CAST(DATEDIFF(second, BK.backup_start_date, BK.backup_finish_date) AS VARCHAR(4)) + ' ' + 'seconds)' FROM msdb..backupset BK WHERE BK.database_name = DB.name ORDER BY backup_set_id DESC),'-') AS [Last backup], CASE WHEN is_fulltext_enabled = 1 THEN 'Fulltext enabled' ELSE '' END AS [fulltext], CASE WHEN is_auto_close_on = 1 THEN 'autoclose' ELSE '' END AS [autoclose], page_verify_option_desc AS [page verify option], CASE WHEN is_read_only = 1 THEN 'read only' ELSE '' END AS [read only], CASE WHEN is_auto_shrink_on = 1 THEN 'autoshrink' ELSE '' END AS [autoshrink], CASE WHEN is_auto_create_stats_on = 1 THEN 'auto create statistics' ELSE '' END AS [auto create statistics], CASE WHEN is_auto_update_stats_on = 1 THEN 'auto update statistics' ELSE '' END AS [auto update statistics], CASE WHEN is_in_standby = 1 THEN 'standby' ELSE '' END AS [standby], CASE WHEN is_cleanly_shutdown = 1 THEN 'cleanly shutdown' ELSE '' END AS [cleanly shutdown] FROM sys.databases DB ORDER BY dbName, [Last backup] DESC, NAME Please let me know if you find this information useful. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article

July, the 31 Days of SQL Server DMO’s – Day 28 (sys.dm_db_stats_properties)

- by Tamarick Hill

The sys.dm_db_stats_properties Dynamic Management Function returns information about the statistics that are currently on your database objects. This function takes two parameters, an object_id and a stats_id. Let’s have a look at the result set from this function against the AdventureWorks2012.Sales.SalesOrderHeader table. To obtain the object_id and stats_id I will use a CROSS APPLY with the sys.stats system table. SELECT sp.* FROM sys.stats s CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.Stats_id) sp WHERE sp.object_id = object_id('Sales.SalesOrderHeader') The first two columns returned by this function are the object_id and the stats_id columns. The next column, ‘last_updated’, gives you the date and the time that a particular statistic was last updated. The next column, ‘rows’, gives you the total number of rows in the table as of the last statistic update date. The ‘rows_sampled’ column gives you the number of rows that were sampled to create the statistic. The ‘steps’ column represents the number of specific value ranges from the statistic histogram. The ‘unfiltered_rows’ column represents the number of rows before any filters are applied. If a particular statistic is not filtered, the ‘unfiltered_rows’ column will always equal the ‘rows’ column. Lastly we have the ‘modification_counter’ column which represents the number of modification to the leading column in a given statistic since the last time the statistic was updated. Probably the most important column from this Dynamic Management Function is the ‘last_updated’ column. You want to always ensure that you have accurate and updated statistics on your database objects. Accurate statistics are vital for the query optimizer to generate efficient and reliable query execution plans. Without accurate and updated statistics, the performance of your SQL Server would likely suffer. For more information about this Dynamic Management Function, please see the below Books Online link: http://msdn.microsoft.com/en-us/library/jj553546.aspx Folllow me on Twitter @PrimeTimeDBA

Read the article

Best design for a "Command Executer" class

- by Justin984

Sorry for the vague title, I couldn't think of a way to condense the question. I am building an application that will run as a background service and intermittently collect data about the system its running on. A second Android controller application will query the system over tcp/ip for statistics about the system. Currently, the background service has a tcp listener class that reads/writes bytes from a socket. When data is received, it raises an event to notify the service. The service takes the bytes, feeds them into a command parser to figure out what is being requested, and then passes the parsed command to a command executer class. When the service receives a "query statistics" command, it should return statistics over the tcp/ip connection. Currently, all of these classes are fully decoupled from each other. But in order for the command executer to return statistics, it will obviously need access to the socket somehow. For reasons I can't completely articulate, it feels wrong for the command executer to have a direct reference to the socket. I'm looking for strategies and/or design patterns I can use to return data over the socket while keeping the classes decoupled, if this is possible. Hopefully this makes sense, please let me know if I can include any info that would make the question easier to understand.

Read the article

?12c database ????Adaptive Execution Plans ????????

- by Liu Maclean(???)

12c R1 ????SQL??????- Adaptive Execution Plans ????????,???????optimizer ??????(runtime)???????????????, ????????????????????? SQL???????? ????????????, ?????????????????????????????????????????????????????????????adaptive plan ????????????????????????????????????,?????subplan???????????????????? ??????, ???????? ???????????????,?????????, ?????? ???????????????”???”????, ???????????????????buffer ??????? ????????????,?????,??????????????????? ???optimizer ?????????????????????????,?????????????????????????????????????????plan???? ??12C?????????????, ???????????????????,?????? ???????????? ????????????2???: Dynamic Plans????: ???????????????????????;??????,???optimizer??????????subplans??????????????, ???????????????????,?????????????? Reoptimization????: ?Dynamic Plans????,Reoptimization??????????????????????Reoptimization??,?????????????????????????,??reoptimization????? OPTIMIZER_ADAPTIVE_REPORTING_ONLY ???? report-only????????????????TRUE,?????????report-only????,???????????????,??????????????? Dynamic Plans ??????????????,????????????????????????, ?????????????,???????????,????????????????????????????????????????? ?????????????final plan??????????????default plan, ??final plan?default plan???????,????????????? subplan ???????????????,???????????????????????? ??????,???????statistics collector ?buffer???????????statistics collector?????????????????,???????????????????????????? ?????????????????????????????????????????,??????????,?????????????? ???????????,???????buffer???? ???????????????,?????????????????????????????,??????buffer,??????final plan? ????????,???????????????????????,????????????????? ?V$SQL??????IS_RESOLVED_DYNAMIC_PLAN??????????final plan???default plan? ??????dynamic plan ???????SQL PLAN directives?????? declare cursor PLAN_DIRECTIVE_IDS is select directive_id from DBA_SQL_PLAN_DIRECTIVES; begin for z in PLAN_DIRECTIVE_IDS loop DBMS_SPD.DROP_SQL_PLAN_DIRECTIVE(z.directive_id); end loop; end; / explain plan for select /*MALCEAN*/ product_name from oe.order_items o, oe.product_information p where o.unit_price=15 and quantity>1 and p.product_id=o.product_id; select * from table(dbms_xplan.display()); Plan hash value: 1255158658 www.askmaclean.com ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 128 | 7 (0)| 00:00:01 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 4 | 128 | 7 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | ORDER_ITEMS | 4 | 48 | 3 (0)| 00:00:01 | |* 4 | INDEX UNIQUE SCAN | PRODUCT_INFORMATION_PK | 1 | | 0 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| PRODUCT_INFORMATION | 1 | 20 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("O"."UNIT_PRICE"=15 AND "QUANTITY">1) 4 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID") alter session set events '10053 trace name context forever,level 1'; OR alter session set events 'trace[SQL_Plan_Directive] disk highest'; select /*MALCEAN*/ product_name from oe.order_items o, oe.product_information p where o.unit_price=15 and quantity>1 and p.product_id=o.product_id; ---------------------------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------------------------+-----------------------------------+ | 0 | SELECT STATEMENT | | | | 7 | | | 1 | HASH JOIN | | 4 | 128 | 7 | 00:00:01 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 4 | 128 | 7 | 00:00:01 | | 4 | STATISTICS COLLECTOR | | | | | | | 5 | TABLE ACCESS FULL | ORDER_ITEMS | 4 | 48 | 3 | 00:00:01 | | 6 | INDEX UNIQUE SCAN | PRODUCT_INFORMATION_PK| 1 | | 0 | | | 7 | TABLE ACCESS BY INDEX ROWID | PRODUCT_INFORMATION | 1 | 20 | 1 | 00:00:01 | | 8 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 20 | 1 | 00:00:01 | ---------------------------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 1 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID") 5 - filter(("O"."UNIT_PRICE"=15 AND "QUANTITY">1)) 6 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID") ===================================== SPD: BEGIN context at statement level ===================================== Stmt: ******* UNPARSED QUERY IS ******* SELECT /*+ OPT_ESTIMATE (@"SEL$1" JOIN ("P"@"SEL$1" "O"@"SEL$1") ROWS=13.000000 ) OPT_ESTIMATE (@"SEL$1" TABLE "O"@"SEL$1" ROWS=13.000000 ) */ "P"."PRODUCT_NAME" "PRODUCT_NAME" FROM "OE"."ORDER_ITEMS" "O","OE"."PRODUCT_INFORMATION" "P" WHERE "O"."UNIT_PRICE"=15 AND "O"."QUANTITY">1 AND "P"."PRODUCT_ID"="O"."PRODUCT_ID" Objects referenced in the statement PRODUCT_INFORMATION[P] 92194, type = 1 ORDER_ITEMS[O] 92197, type = 1 Objects in the hash table Hash table Object 92197, type = 1, ownerid = 6573730143572393221: No Dynamic Sampling Directives for the object Hash table Object 92194, type = 1, ownerid = 17822962561575639002: No Dynamic Sampling Directives for the object Return code in qosdInitDirCtx: ENBLD =================================== SPD: END context at statement level =================================== ======================================= SPD: BEGIN context at query block level ======================================= Query Block SEL$1 (#0) Return code in qosdSetupDirCtx4QB: NOCTX ===================================== SPD: END context at query block level ===================================== SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE SPD: Generating finding id: type = 1, reason = 1, objcnt = 1, obItr = 0, objid = 92197, objtyp = 1, vecsize = 6, colvec = [4, 5, ], fid = 2896834833840853267 SPD: Inserted felem, fid=2896834833840853267, ftype = 1, freason = 1, dtype = 0, dstate = 0, dflag = 0, ver = YES, keep = YES SPD: qosdCreateFindingSingTab retCode = CREATED, fid = 2896834833840853267 SPD: qosdCreateDirCmp retCode = CREATED, fid = 2896834833840853267 SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SKIP_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER SPD: Generating finding id: type = 1, reason = 1, objcnt = 1, obItr = 0, objid = 92197, objtyp = 1, vecsize = 6, colvec = [4, 5, ], fid = 2896834833840853267 SPD: Modified felem, fid=2896834833840853267, ftype = 1, freason = 1, dtype = 0, dstate = 0, dflag = 0, ver = YES, keep = YES SPD: Generating finding id: type = 1, reason = 1, objcnt = 1, obItr = 0, objid = 92194, objtyp = 1, vecsize = 2, colvec = [1, ], fid = 5618517328604016300 SPD: Modified felem, fid=5618517328604016300, ftype = 1, freason = 1, dtype = 0, dstate = 0, dflag = 0, ver = NO, keep = NO SPD: Generating finding id: type = 1, reason = 1, objcnt = 1, obItr = 0, objid = 92194, objtyp = 1, vecsize = 2, colvec = [1, ], fid = 1142802697078608149 SPD: Modified felem, fid=1142802697078608149, ftype = 1, freason = 1, dtype = 0, dstate = 0, dflag = 0, ver = NO, keep = NO SPD: Generating finding id: type = 1, reason = 2, objcnt = 2, obItr = 0, objid = 92194, objtyp = 1, vecsize = 0, obItr = 1, objid = 92197, objtyp = 1, vecsize = 0, fid = 1437680122701058051 SPD: Modified felem, fid=1437680122701058051, ftype = 1, freason = 2, dtype = 0, dstate = 0, dflag = 0, ver = NO, keep = NO select * from table(dbms_xplan.display_cursor(format=>'report')) ; ????report????adaptive plan Adaptive plan: ------------- This cursor has an adaptive plan, but adaptive plans are enabled for reporting mode only. The plan that would be executed if adaptive plans were enabled is displayed below. ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 7 (100)| | |* 1 | HASH JOIN | | 4 | 128 | 7 (0)| 00:00:01 | |* 2 | TABLE ACCESS FULL| ORDER_ITEMS | 4 | 48 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| PRODUCT_INFORMATION | 1 | 20 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ SQL> select SQL_ID,IS_RESOLVED_DYNAMIC_PLAN,sql_text from v$SQL WHERE SQL_TEXT like '%MALCEAN%' and sql_text not like '%like%'; SQL_ID IS -------------------------- -- SQL_TEXT -------------------------------------------------------------------------------- 6ydj1bn1bng17 Y select /*MALCEAN*/ product_name from oe.order_items o, oe.product_information p where o.unit_price=15 and quantity>1 and p.product_id=o.product_id ???? explain plan for ????default plan, ??????optimizer???final plan,??V$SQL.IS_RESOLVED_DYNAMIC_PLAN???Y,????????????? DBA_SQL_PLAN_DIRECTIVES?????????????SQL PLAN DIRECTIVES, ???12c? ???MMON?????DML ???column usage??????????,????SMON??? MMON????SGA??PLAN DIRECTIVES??? ?????DBMS_SPD.flush_sql_plan_directive???? select directive_id,type,reason from DBA_SQL_PLAN_DIRECTIVES / DIRECTIVE_ID TYPE REASON ----------------------------------- -------------------------------- ----------------------------- 10321283028317893030 DYNAMIC_SAMPLING JOIN CARDINALITY MISESTIMATE 4757086536465754886 DYNAMIC_SAMPLING JOIN CARDINALITY MISESTIMATE 16085268038103121260 DYNAMIC_SAMPLING JOIN CARDINALITY MISESTIMATE SQL> set pages 9999 SQL> set lines 300 SQL> col state format a5 SQL> col subobject_name format a11 SQL> col col_name format a11 SQL> col object_name format a13 SQL> select d.directive_id, o.object_type, o.object_name, o.subobject_name col_name, d.type, d.state, d.reason 2 from dba_sql_plan_directives d, dba_sql_plan_dir_objects o 3 where d.DIRECTIVE_ID=o.DIRECTIVE_ID 4 and o.object_name in ('ORDER_ITEMS') 5 order by d.directive_id; DIRECTIVE_ID OBJECT_TYPE OBJECT_NAME COL_NAME TYPE STATE REASON ------------ ------------ ------------- ----------- -------------------------------- ----- ------------------------------------- --- 1.8156E+19 COLUMN ORDER_ITEMS UNIT_PRICE DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY MISESTIMATE 1.8156E+19 TABLE ORDER_ITEMS DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY MISESTIMATE 1.8156E+19 COLUMN ORDER_ITEMS QUANTITY DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY MISESTIMATE DBA_SQL_PLAN_DIRECTIVES????? _BASE_OPT_DIRECTIVE ? _BASE_OPT_FINDING SELECT d.dir_own#, d.dir_id, d.f_id, decode(type, 1, 'DYNAMIC_SAMPLING', 'UNKNOWN'), decode(state, 1, 'NEW', 2, 'MISSING_STATS', 3, 'HAS_STATS', 4, 'CANDIDATE', 5, 'PERMANENT', 6, 'DISABLED', 'UNKNOWN'), decode(bitand(flags, 1), 1, 'YES', 'NO'), cast(d.created as timestamp), cast(d.last_modified as timestamp), -- Please see QOSD_DAYS_TO_UPDATE and QOSD_PLUS_SECONDS for more details -- about 6.5 cast(d.last_used as timestamp) - NUMTODSINTERVAL(6.5, 'day') FROM sys.opt_directive$ d ??dbms_spd??? SQL PLAN DIRECTIVES, SQL PLAN DIRECTIVES???retention ???53?: Package: DBMS_SPD This package provides subprograms for managing Sql Plan Directives(SPD). SPD are objects generated automatically by Oracle server. For example, if server detects that the single table cardinality estimated by optimizer is off from the actual number of rows returned when accessing the table, it will automatically create a directive to do dynamic sampling for the table. When any Sql statement referencing the table is compiled, optimizer will perform dynamic sampling for the table to get more accurate estimate. Notes: DBMSL_SPD is a invoker-rights package. The invoker requires ADMINISTER SQL MANAGEMENT OBJECT privilege for executing most of the subprograms of this package. Also the subprograms commit the current transaction (if any), perform the operation and commit it again. DBA view dba_sql_plan_directives shows all the directives created in the system and the view dba_sql_plan_dir_objects displays the objects that are included in the directives. -- Default value for SPD_RETENTION_WEEKS SPD_RETENTION_WEEKS_DEFAULT CONSTANT varchar2(4) := '53'; | STATE : NEW : Newly created directive. | : MISSING_STATS : The directive objects do not | have relevant stats. | : HAS_STATS : The objects have stats. | : PERMANENT : A permanent directive. Server | evaluated effectiveness and these | directives are useful. | | AUTO_DROP : YES : Directive will be dropped | automatically if not | used for SPD_RETENTION_WEEKS. | This is the default behavior. | NO : Directive will not be dropped | automatically. Procedure: flush_sql_plan_directive This procedure allows manually flushing the Sql Plan directives that are automatically recorded in SGA memory while executing sql statements. The information recorded in SGA are periodically flushed by oracle background processes. This procedure just provides a way to flush the information manually. ????”_optimizer_dynamic_plans”(enable dynamic plans)????????,???TRUE??DYNAMIC PLAN? ???FALSE???????????? ????,Dynamic Plan????????????Nested Loop?Hash Join???case ,????????Nested loop???????????HASH JOIN,?HASH JOIN????????????????? ????????subplan?????,???? pass?? ?join method???,?????STATISTICS COLLECTOR???cardinality?,???????HASH JOIN?????Nested Loop,????????????subplan?????access path; ???????Sales??????????????????,????HASH JOIN,??SUBPLAN??customers?????????;?????Nested Loop,???????cust_id?????Range Scan+Access by Rowid? Cardinality feedback Cardinality feedback????????11.2????,????????re-optimization???; ???????????,Cardinality feedback?????????????????????????? ???????????????????,?????????????????,??????????Cardinality feedback????????????? ????????????????????????? ??????????????Cardinality feedback ??: ????????,???????????,??????????,????????????????selectivity ??? ????????????: ??????,?????????????????????????????????,??????????????????? ????????????????????????????????????????,?????????????????????????? ?????????,???????????????,?????????? ??????????Cardinality ????,??????join Cardinality ????????? Cardinality feedback???????cursor?,?Cursor???aged out????? SELECT /*+ gather_plan_statistics */ product_name FROM order_items o, product_information p WHERE o.unit_price = 15 AND quantity > 1 AND p.product_id = o.product_id Plan hash value: 1553478007 ---------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 13 |00:00:00.01 | 24 | 20 | | | | |* 1 | HASH JOIN | | 1 | 4 | 13 |00:00:00.01 | 24 | 20 | 2061K| 2061K| 429K (0)| |* 2 | TABLE ACCESS FULL| ORDER_ITEMS | 1 | 4 | 13 |00:00:00.01 | 7 | 6 | | | | | 3 | TABLE ACCESS FULL| PRODUCT_INFORMATION | 1 | 1 | 288 |00:00:00.01 | 17 | 14 | | | | ---------------------------------------------------------------------------------------------------------------------------------------- SELECT /*+ gather_plan_statistics */ product_name FROM order_items o, product_information p WHERE o.unit_price = 15 AND quantity > 1 AND p.product_id = o.product_id Plan hash value: 1553478007 ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 13 |00:00:00.01 | 24 | | | | |* 1 | HASH JOIN | | 1 | 13 | 13 |00:00:00.01 | 24 | 2061K| 2061K| 413K (0)| |* 2 | TABLE ACCESS FULL| ORDER_ITEMS | 1 | 13 | 13 |00:00:00.01 | 7 | | | | | 3 | TABLE ACCESS FULL| PRODUCT_INFORMATION | 1 | 288 | 288 |00:00:00.01 | 17 | | | | ------------------------------------------------------------------------------------------------------------------------------- Note ----- - statistics feedback used for this statement SQL> select count(*) from v$SQL where SQL_ID='cz0hg2zkvd10y'; COUNT(*) ---------- 2 SQL>select sql_ID,USE_FEEDBACK_STATS FROM V$SQL_SHARED_CURSOR where USE_FEEDBACK_STATS ='Y'; SQL_ID U ------------- - cz0hg2zkvd10y Y ????????Cardinality feedback????,???????????????????????????,????????????order_items???????? ????2??????plan hash value??(??????????),?????2????child cursor??????gather_plan_statistics???actual : A-ROWS estimate :E-ROWS????????? Automatic Re-optimization ???dynamic plan, Re-optimization??????????????? ? ??????????????? ???????????????????????????????? ???????????,??????????????, ???????????????????? ??????????? Re-optimization??, ????????????????????? Re-optimization????dynamic plan?????????? dynamic plan????????????????????, ???????????????????? ????,??????????join order ??????????????,?????????????join order????? ??????,????????Re-optimization, ??Re-optimization ??????????????????? ?Oracle database 12c?,join statistics?????????????????????,??????????????????????Re-optimization???????????adaptive cursor sharing????? ????????????????,???????????? ????? ???????statistics collectors ????????????????????Re-optimization??????2?????????????,???????????????? ??????????????Re-optimization?????,?????????????????????? ???v$SQL??????IS_REOPTIMIZABLE?????????????????????Re-optimization,??????????Re-optimization???,?????Re-optimization ,???????reporting????? IS_REOPTIMIZABLE VARCHAR2(1) This columns shows whether the next execution matching this child cursor will trigger a reoptimization. The values are: Y: If the next execution will trigger a reoptimization R: If the child cursor contains reoptimization information, but will not trigger reoptimization because the cursor was compiled in reporting mode N: If the child cursor has no reoptimization information ??1: select plan_table_output from table (dbms_xplan.display_cursor('gwf99gfnm0t7g',NULL,'ALLSTATS LAST')); SQL_ID gwf99gfnm0t7g, child number 0 ------------------------------------- SELECT /*+ SFTEST gather_plan_statistics */ o.order_id, v.product_name FROM orders o, ( SELECT order_id, product_name FROM order_items o, product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id Plan hash value: 1906736282 ------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 269 |00:00:00.02 | 1336 | 18 | | | | | 1 | NESTED LOOPS | | 1 | 1 | 269 |00:00:00.02 | 1336 | 18 | | | | | 2 | MERGE JOIN CARTESIAN| | 1 | 4 | 9135 |00:00:00.02 | 34 | 15 | | | | |* 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 1 | 87 |00:00:00.01 | 33 | 14 | | | | | 4 | BUFFER SORT | | 87 | 105 | 9135 |00:00:00.01 | 1 | 1 | 4096 | 4096 | 4096 (0)| | 5 | INDEX FULL SCAN | ORDER_PK | 1 | 105 | 105 |00:00:00.01 | 1 | 1 | | | | |* 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 9135 | 1 | 269 |00:00:00.01 | 1302 | 3 | | | | ------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50)) 6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID") SQL_ID gwf99gfnm0t7g, child number 1 ------------------------------------- SELECT /*+ SFTEST gather_plan_statistics */ o.order_id, v.product_name FROM orders o, ( SELECT order_id, product_name FROM order_items o, product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id Plan hash value: 35479787 -------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 269 |00:00:00.01 | 63 | 3 | | | | | 1 | NESTED LOOPS | | 1 | 269 | 269 |00:00:00.01 | 63 | 3 | | | | |* 2 | HASH JOIN | | 1 | 313 | 269 |00:00:00.01 | 42 | 3 | 1321K| 1321K| 1234K (0)| |* 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 87 | 87 |00:00:00.01 | 16 | 0 | | | | | 4 | INDEX FAST FULL SCAN| ORDER_ITEMS_UK | 1 | 665 | 665 |00:00:00.01 | 26 | 3 | | | | |* 5 | INDEX UNIQUE SCAN | ORDER_PK | 269 | 1 | 269 |00:00:00.01 | 21 | 0 | | | | -------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID") 3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50)) 5 - access("O"."ORDER_ID"="ORDER_ID") Note ----- - statistics feedback used for this statement SQL> select IS_REOPTIMIZABLE,child_number FROM V$SQL A where A.SQL_ID='gwf99gfnm0t7g'; IS CHILD_NUMBER -- ------------ Y 0 N 1 1* select child_number,other_xml From v$SQL_PLAN where SQL_ID='gwf99gfnm0t7g' and other_xml is not nul SQL> / CHILD_NUMBER OTHER_XML ------------ -------------------------------------------------------------------------------- 1 <other_xml><info type="cardinality_feedback">yes</info><info type="db_version">1 2.1.0.1</info><info type="parse_schema"><![CDATA["OE"]]></info><info type="plan_ hash">35479787</info><info type="plan_hash_2">3382491761</info><outline_data><hi nt><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint><hint><![CDATA[OPTIMIZER_FEATUR ES_ENABLE('12.1.0.1')]]></hint><hint><![CDATA[DB_VERSION('12.1.0.1')]]></hint><h int><![CDATA[ALL_ROWS]]></hint><hint><![CDATA[OUTLINE_LEAF(@"SEL$F5BB74E1")]]></ hint><hint><![CDATA[MERGE(@"SEL$2")]]></hint><hint><![CDATA[OUTLINE(@"SEL$1")]]> </hint><hint><![CDATA[OUTLINE(@"SEL$2")]]></hint><hint><![CDATA[FULL(@"SEL$F5BB7 4E1" "P"@"SEL$2")]]></hint><hint><![CDATA[INDEX_FFS(@"SEL$F5BB74E1" "O"@"SEL$2" ("ORDER_ITEMS"."ORDER_ID" "ORDER_ITEMS"."PRODUCT_ID"))]]></hint><hint><![CDATA[I NDEX(@"SEL$F5BB74E1" "O"@"SEL$1" ("ORDERS"."ORDER_ID"))]]></hint><hint><![CDATA[ LEADING(@"SEL$F5BB74E1" "P"@"SEL$2" "O"@"SEL$2" "O"@"SEL$1")]]></hint><hint><![C DATA[USE_HASH(@"SEL$F5BB74E1" "O"@"SEL$2")]]></hint><hint><![CDATA[USE_NL(@"SEL$ F5BB74E1" "O"@"SEL$1")]]></hint></outline_data></other_xml> 0 <other_xml><info type="db_version">12.1.0.1</info><info type="parse_schema"><![C DATA["OE"]]></info><info type="plan_hash">1906736282</info><info type="plan_hash _2">2579473118</info><outline_data><hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]> </hint><hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('12.1.0.1')]]></hint><hint><![CD ATA[DB_VERSION('12.1.0.1')]]></hint><hint><![CDATA[ALL_ROWS]]></hint><hint><![CD ATA[OUTLINE_LEAF(@"SEL$F5BB74E1")]]></hint><hint><![CDATA[MERGE(@"SEL$2")]]></hi nt><hint><![CDATA[OUTLINE(@"SEL$1")]]></hint><hint><![CDATA[OUTLINE(@"SEL$2")]]> </hint><hint><![CDATA[FULL(@"SEL$F5BB74E1" "P"@"SEL$2")]]></hint><hint><![CDATA[ INDEX(@"SEL$F5BB74E1" "O"@"SEL$1" ("ORDERS"."ORDER_ID"))]]></hint><hint><![CDATA [INDEX(@"SEL$F5BB74E1" "O"@"SEL$2" ("ORDER_ITEMS"."ORDER_ID" "ORDER_ITEMS"."PROD UCT_ID"))]]></hint><hint><![CDATA[LEADING(@"SEL$F5BB74E1" "P"@"SEL$2" "O"@"SEL$1 " "O"@"SEL$2")]]></hint><hint><![CDATA[USE_MERGE_CARTESIAN(@"SEL$F5BB74E1" "O"@" SEL$1")]]></hint><hint><![CDATA[USE_NL(@"SEL$F5BB74E1" "O"@"SEL$2")]]></hint></o utline_data></other_xml> ??2: SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US'; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 | 14 | |* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 8 | 29 |00:00:00.01 | 17 | 14 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US')) SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US' EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OWNER OBJECT_NAME COL_NAME OBJECT TYPE STATE REASON ----------------------- ----- ------------- ----------- ------ ---------------- ----- ------------------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY PROVINCE MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPLING NEW SINGLE TABLE CARDINALITY MISESTIMATE SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US'; ELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 1 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 | |* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 29 | 29 |00:00:00.01 | 17 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US')) Note ----- - cardinality feedback used for this statement SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US' b74nw722wjvy3 1 select /*+g N ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US' SELECT /*+gather_plan_statistics*/ CUST_EMAIL FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='MA' AND COUNTRY_ID='US'; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 3tk6hj3nkcs2u, child number 0 ------------------------------------- Select /*+gather_plan_statistics*/ cust_email From customers Where cust_state_province='MA' And country_id='US' Plan hash value: 1683234692 ------------------------------------------------------------------------------- |Id | Operation | Name | Starts|E-Rows|A-Rows| A-Time |Buffers| ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01| 16 | |*1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 2| 2 |00:00:00.01| 16 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='MA' AND "COUNTRY_ID"='US')) Note ----- - dynamic sampling used for this statement (level=2) - 1 Sql Plan Directive used for this statement EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OW OBJECT_NA COL_NAME OBJECT TYPE STATE REASON ------------------- -- --------- ---------- ------- --------------- ------------- ------------------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY PROVINCE MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY MISESTIMATE

Read the article

Troubleshooting Application Timeouts in SQL Server

- by Tara Kizer

I recently received the following email from a blog reader: "We are having an OLTP database instance, using SQL Server 2005 with little to moderate traffic (10-20 requests/min). There are also bulk imports that occur at regular intervals in this DB and the import duration ranges between 10secs to 1 min, depending on the data size. Intermittently (2-3 times in a week), we face an issue, where queries get timed out (default of 30 secs set in application). On analyzing, we found two stored procedures, having queries with multiple table joins inside them of taking a long time (5-10 mins) in getting executed, when ideally the execution duration ranges between 5-10 secs. Execution plan of the same displayed Clustered Index Scan happening instead of Clustered Index Seek. All required Indexes are found to be present and Index fragmentation is also minimal as we Rebuild Indexes regularly alongwith Updating Statistics. With no other alternate options occuring to us, we restarted SQL server and thereafter the performance was back on track. But sometimes it was still giving timeout errors for some hits and so we also restarted IIS and that stopped the problem as of now." Rather than respond directly to the blog reader, I thought it would be more interesting to share my thoughts on this issue in a blog. There are a few things that I can think of that could cause abnormal timeouts: Blocking Bad plan in cache Outdated statistics Hardware bottleneck To determine if blocking is the issue, we can easily run sp_who/sp_who2 or a query directly on sysprocesses (select * from master..sysprocesses where blocking <> 0). If blocking is present and consistent, then you'll need to determine whether or not to kill the parent blocking process. Killing a process will cause the transaction to rollback, so you need to proceed with caution. Killing the parent blocking process is only a temporary solution, so you'll need to do more thorough analysis to figure out why the blocking was present. You should look into missing indexes and perhaps consider changing the database's isolation level to READ_COMMITTED_SNAPSHOT. The blog reader mentions that the execution plan shows a clustered index scan when a clustered index seek is normal for the stored procedure. A clustered index scan might have been chosen either because that is what is in cache already or because of out of date statistics. The blog reader mentions that bulk imports occur at regular intervals, so outdated statistics is definitely something that could cause this issue. The blog reader may need to update statistics after imports are done if the imports are changing a lot of data (greater than 10%). If the statistics are good, then the query optimizer might have chosen to scan rather than seek in a previous execution because the scan was determined to be less costly due to the value of an input parameter. If this parameter value is rare, then its execution plan in cache is what we call a bad plan. You want the best plan in cache for the most frequent parameter values. If a bad plan is a recurring problem on your system, then you should consider rewriting the stored procedure. You might want to break up the code into multiple stored procedures so that each can have a different execution plan in cache. To remove a bad plan from cache, you can recompile the stored procedure. An alternative method is to run DBCC FREEPROCACHE which drops the procedure cache. It is better to recompile stored procedures rather than dropping the procedure cache as dropping the procedure cache affects all plans in cache rather than just the ones that were bad, so there will be a temporary performance penalty until the plans are loaded into cache again. To determine if there is a hardware bottleneck occurring such as slow I/O or high CPU utilization, you will need to run Performance Monitor on the database server. Hopefully you already have a baseline of the server so you know what is normal and what is not. Be on the lookout for I/O requests taking longer than 12 milliseconds and CPU utilization over 90%. The servers that I support typically are under 30% CPU utilization, but your baseline could be higher and be within a normal range. If restarting the SQL Server service fixes the problem, then the problem was most likely due to blocking or a bad plan in the procedure cache. Rather than restarting the SQL Server service, which causes downtime, the blog reader should instead analyze the above mentioned things. Proceed with caution when restarting the SQL Server service as all transactions that have not completed will be rolled back at startup. This crash recovery process could take longer than normal if there was a long-running transaction running when the service was stopped. Until the crash recovery process is completed on the database, it is unavailable to your applications. If restarting IIS fixes the problem, then the problem might not have been inside SQL Server. Prior to taking this step, you should do analysis of the above mentioned things. If you can think of other reasons why the blog reader is facing this issue a few times a week, I'd love to hear your thoughts via a blog comment.

Read the article

NET Math Libraries

- by JoshReuben

NET Mathematical Libraries .NET Builder for Matlab The MathWorks Inc. - http://www.mathworks.com/products/netbuilder/ MATLAB Builder NE generates MATLAB based .NET and COM components royalty-free deployment creates the components by encrypting MATLAB functions and generating either a .NET or COM wrapper around them. .NET/Link for Mathematica www.wolfram.com a product that 2-way integrates Mathematica and Microsoft's .NET platform call .NET from Mathematica - use arbitrary .NET types directly from the Mathematica language. use and control the Mathematica kernel from a .NET program. turns Mathematica into a scripting shell to leverage the computational services of Mathematica. write custom front ends for Mathematica or use Mathematica as a computational engine for another program comes with full source code. Leverages MathLink - a Wolfram Research's protocol for sending data and commands back and forth between Mathematica and other programs. .NET/Link abstracts the low-level details of the MathLink C API. Extreme Optimization http://www.extremeoptimization.com/ a collection of general-purpose mathematical and statistical classes built for the.NET framework. It combines a math library, a vector and matrix library, and a statistics library in one package. download the trial of version 4.0 to try it out. Multi-core ready - Full support for Task Parallel Library features including cancellation. Broad base of algorithms covering a wide range of numerical techniques, including: linear algebra (BLAS and LAPACK routines), numerical analysis (integration and differentiation), equation solvers. Mathematics leverages parallelism using .NET 4.0's Task Parallel Library. Basic math: Complex numbers, 'special functions' like Gamma and Bessel functions, numerical differentiation. Solving equations: Solve equations in one variable, or solve systems of linear or nonlinear equations. Curve fitting: Linear and nonlinear curve fitting, cubic splines, polynomials, orthogonal polynomials. Optimization: find the minimum or maximum of a function in one or more variables, linear programming and mixed integer programming. Numerical integration: Compute integrals over finite or infinite intervals, over 2D and higher dimensional regions. Integrate systems of ordinary differential equations (ODE's). Fast Fourier Transforms: 1D and 2D FFT's using managed or fast native code (32 and 64 bit) BigInteger, BigRational, and BigFloat: Perform operations with arbitrary precision. Vector and Matrix Library Real and complex vectors and matrices. Single and double precision for elements. Structured matrix types: including triangular, symmetrical and band matrices. Sparse matrices. Matrix factorizations: LU decomposition, QR decomposition, singular value decomposition, Cholesky decomposition, eigenvalue decomposition. Portability and performance: Calculations can be done in 100% managed code, or in hand-optimized processor-specific native code (32 and 64 bit). Statistics Data manipulation: Sort and filter data, process missing values, remove outliers, etc. Supports .NET data binding. Statistical Models: Simple, multiple, nonlinear, logistic, Poisson regression. Generalized Linear Models. One and two-way ANOVA. Hypothesis Tests: 12 14 hypothesis tests, including the z-test, t-test, F-test, runs test, and more advanced tests, such as the Anderson-Darling test for normality, one and two-sample Kolmogorov-Smirnov test, and Levene's test for homogeneity of variances. Multivariate Statistics: K-means cluster analysis, hierarchical cluster analysis, principal component analysis (PCA), multivariate probability distributions. Statistical Distributions: 25 29 continuous and discrete statistical distributions, including uniform, Poisson, normal, lognormal, Weibull and Gumbel (extreme value) distributions. Random numbers: Random variates from any distribution, 4 high-quality random number generators, low discrepancy sequences, shufflers. New in version 4.0 (November, 2010) Support for .NET Framework Version 4.0 and Visual Studio 2010 TPL Parallellized – multicore ready sparse linear program solver - can solve problems with more than 1 million variables. Mixed integer linear programming using a branch and bound algorithm. special functions: hypergeometric, Riemann zeta, elliptic integrals, Frensel functions, Dawson's integral. Full set of window functions for FFT's. Product Price Update subscription Single Developer License $999 $399 Team License (3 developers) $1999 $799 Department License (8 developers) $3999 $1599 Site License (Unlimited developers in one physical location) $7999 $3199 NMath http://www.centerspace.net .NET math and statistics libraries matrix and vector classes random number generators Fast Fourier Transforms (FFTs) numerical integration linear programming linear regression curve and surface fitting optimization hypothesis tests analysis of variance (ANOVA) probability distributions principal component analysis cluster analysis built on the Intel Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). Product Price Update subscription Single Developer License $1295 $388 Team License (5 developers) $5180 $1554 DotNumerics http://www.dotnumerics.com/NumericalLibraries/Default.aspx free DotNumerics is a website dedicated to numerical computing for .NET that includes a C# Numerical Library for .NET containing algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, ports from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. Linear Algebra (CSLapack, CSBlas and CSEispack). Systems of linear equations, eigenvalue problems, least-squares solutions of linear systems and singular value problems. Differential Equations. Initial-value problem for nonstiff and stiff ordinary differential equations ODEs (explicit Runge-Kutta, implicit Runge-Kutta, Gear's BDF and Adams-Moulton). Optimization. Unconstrained and bounded constrained optimization of multivariate functions (L-BFGS-B, Truncated Newton and Simplex methods). Math.NET Numerics http://numerics.mathdotnet.com/ free an open source numerical library - includes special functions, linear algebra, probability models, random numbers, interpolation, integral transforms. A merger of dnAnalytics with Math.NET Iridium in addition to a purely managed implementation will also support native hardware optimization. constants & special functions complex type support real and complex, dense and sparse linear algebra (with LU, QR, eigenvalues, ... decompositions) non-uniform probability distributions, multivariate distributions, sample generation alternative uniform random number generators descriptive statistics, including order statistics various interpolation methods, including barycentric approaches and splines numerical function integration (quadrature) routines integral transforms, like fourier transform (FFT) with arbitrary lengths support, and hartley spectral-space aware sequence manipulation (signal processing) combinatorics, polynomials, quaternions, basic number theory. parallelized where appropriate, to leverage multi-core and multi-processor systems fully managed or (if available) using native libraries (Intel MKL, ACMS, CUDA, FFTW) provides a native facade for F# developers

Read the article

Getting started with Oracle Database In-Memory Part III - Querying The IM Column Store

- by Maria Colgan

In my previous blog posts, I described how to install, enable, and populate the In-Memory column store (IM column store). This weeks post focuses on how data is accessed within the IM column store. Let’s take a simple query “What is the most expensive air-mail order we have received to date?” SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE lo_shipmode = 5; The LINEORDER table has been populated into the IM column store and since we have no alternative access paths (indexes or views) the execution plan for this query is a full table scan of the LINEORDER table. You will notice that the execution plan has a new set of keywords “IN MEMORY" in the access method description in the Operation column. These keywords indicate that the LINEORDER table has been marked for INMEMORY and we may use the IM column store in this query. What do I mean by “may use”? There are a small number of cases were we won’t use the IM column store even though the object has been marked INMEMORY. This is similar to how the keyword STORAGE is used on Exadata environments. You can confirm that the IM column store was actually used by examining the session level statistics, but more on that later. For now let's focus on how the data is accessed in the IM column store and why it’s faster to access the data in the new column format, for analytical queries, rather than the buffer cache. There are four main reasons why accessing the data in the IM column store is more efficient. 1. Access only the column data needed The IM column store only has to scan two columns – lo_shipmode and lo_ordtotalprice – to execute this query while the traditional row store or buffer cache has to scan all of the columns in each row of the LINEORDER table until it reaches both the lo_shipmode and the lo_ordtotalprice column. 2. Scan and filter data in it's compressed format When data is populated into the IM column it is automatically compressed using a new set of compression algorithms that allow WHERE clause predicates to be applied against the compressed formats. This means the volume of data scanned in the IM column store for our query will be far less than the same query in the buffer cache where it will scan the data in its uncompressed form, which could be 20X larger. 3. Prune out any unnecessary data within each column The fastest read you can execute is the read you don’t do. In the IM column store a further reduction in the amount of data accessed is possible due to the In-Memory Storage Indexes(IM storage indexes) that are automatically created and maintained on each of the columns in the IM column store. IM storage indexes allow data pruning to occur based on the filter predicates supplied in a SQL statement. An IM storage index keeps track of minimum and maximum values for each column in each of the In-Memory Compression Unit (IMCU). In our query the WHERE clause predicate is on the lo_shipmode column. The IM storage index on the lo_shipdate column is examined to determine if our specified column value 5 exist in any IMCU by comparing the value 5 to the minimum and maximum values maintained in the Storage Index. If the value 5 is outside the minimum and maximum range for an IMCU, the scan of that IMCU is avoided. For the IMCUs where the value 5 does fall within the min, max range, an additional level of data pruning is possible via the metadata dictionary created when dictionary-based compression is used on IMCU. The dictionary contains a list of the unique column values within the IMCU. Since we have an equality predicate we can easily determine if 5 is one of the distinct column values or not. The combination of the IM storage index and dictionary based pruning, enables us to only scan the necessary IMCUs. 4. Use SIMD to apply filter predicates For the IMCU that need to be scanned Oracle takes advantage of SIMD vector processing (Single Instruction processing Multiple Data values). Instead of evaluating each entry in the column one at a time, SIMD vector processing allows a set of column values to be evaluated together in a single CPU instruction. The column format used in the IM column store has been specifically designed to maximize the number of column entries that can be loaded into the vector registers on the CPU and evaluated in a single CPU instruction. SIMD vector processing enables the Oracle Database In-Memory to scan billion of rows per second per core versus the millions of rows per second per core scan rate that can be achieved in the buffer cache. I mentioned earlier in this post that in order to confirm the IM column store was used; we need to examine the session level statistics. You can monitor the session level statistics by querying the performance views v$mystat and v$statname. All of the statistics related to the In-Memory Column Store begin with IM. You can see the full list of these statistics by typing: display_name format a30 SELECT display_name FROM v$statname WHERE display_name LIKE 'IM%'; If we check the session statistics after we execute our query the results would be as follow; SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE lo_shipmode = 5; SELECT display_name FROM v$statname WHERE display_name IN ('IM scan CUs columns accessed', 'IM scan segments minmax eligible', 'IM scan CUs pruned'); As you can see, only 2 IMCUs were accessed during the scan as the majority of the IMCUs (44) in the LINEORDER table were pruned out thanks to the storage index on the lo_shipmode column. In next weeks post I will describe how you can control which queries use the IM column store and which don't. +Maria Colgan

Read the article

Winnipeg SQL Server UG April Event – How To Do An Index Review

- by D'Arcy Lussier

April Event - How to Do an Index Review April 14th, 2010 5:30 - 8:00 17th Floor Conference Room, Richardson Building One Lombard Place, Winnipeg Pizza and Drinks Provided! Did you know that SQL Server 2005+ keeps query execution statistics, index usage statistics and even missing index statistics? Learn how to access this information and use it to help you make good decisions about what your database really needs in terms of indexes in a lot less time than you might think an index review should take. There are 6 or 7 (depending on your version of SQL server) DMVs (dynamic management views) to look at which reveal a lot about your database and how you can improve its performance. To register for this event, please click HERE to register!

Read the article

Sharepoint SSRS: Unexpected Error Encountered

- by Nicholas

We are running an intranet website hosted on Sharepoint 2007, serving reports to users via SSRS. Recently, our users are experiencing error when trying to access certain reports with the error message "An unexpected error has occurred". After trying many things, to cut a long story short we manage to temporarily solve the problem by manually running UPDATE STATISTICS on the tables in the database. However, this error will periodically crop up again and again, running UPDATE STATISTICS immediately solved it. I had suggested to run a daily job in the server to auto run UPDATE STATISTICS every few hours or so but was rejected by my team lead. Is there any other workaround for this error? Why does UPDATE STATISTICS solve the problem, I still not very understand?

Read the article

block write access to table from an application in mysql

- by hoberion

Hello, We have a CMS plugin that writes statistics to 1 table, this creates performance issues on the entire platform. We decided to use another statistics plugin which can connect to a different database server (the first plugin couldn't!) however we need parts of the first plugin. I want to lock the statistics table to prevent misusage (not allowed to drop it by the developer) So I was wondering if a lock table could do this or if I can implement some sort of read only table

Read the article

JMX Based Monitoring - Part Two - JVM Monitoring

- by Anthony Shorten

This the second article in the series focussing on the JMX based monitoring capabilities possible with the Oracle Utilities Application Framework. In all versions of the Oracle utilities Application Framework, it is possible to use the basic JMX based monitoring available with the Java Virtual Machine to provide basic statistics ablut the JVM. In Java 5 and above, the JVM automatically allowed local monitoring of the JVM statistics from an approporiate console. When I say local I mean the monitoring tool must be executed from the same machine (and in some cases the same user that is running the JVM) to connect to the JVM directly. If you are using jconsole, for example, then you must have access to a GUI (X-Windows or Windows) to display the jconsole output. This is the easist way of monitoring without doing too much configration but is not always practical. Java offers a remote monitorig capability to allow yo to connect to a remotely executing JVM from a console (like jconsole). To use this facility additional JVM options must be added to the command line that started the JVM. Details of the additional options for the version of the Java you are running is located at the JMX information site. Typically to remotely connect to a running JVM that JVM must be configured with the following categories of options: JMX Port - The JVM must allow connections on a listening port specified on the command line Connection security - The connection to the JVM can be secured. This is recommended as JMX is not just a monitoring protocol it is a managemet protocol. It is possible to change values in a running JVM using JMX and there are NO "Are you sure?" safeguards. For a Oracle Utilities Application Framework based application there are a few guidelines when configuring and using this JMX based remote monitoring of the JVM's: Online JVM - The JVM used to run the online system is embedded within the J2EE Web Application Server. To enable JMX monitoring on this JVM you can either change the startup script that starts the Web Application Server or check whether your J2EE Web Application natively supports JVM statistics collection. Child JVM's (COBOL only) - The Child JVM's should not be monitored using this method as they are recycled regularly by the configuration and therefore statistics collected are of little value. Batch Threadpoools - Batch already has a JMX interface (which will be covered in another article). Additional monitoring can be enabled but the base supported monitoring is sufficient for most needs. If you are an Oracle Utilities Application Framework site, then you can specify the additional options for JMX Java monitoring on the OPTS paramaters supported for each component of the architecture. Just ensure the port numbers used are unique for each JVM running on any machine.

Read the article

ZFS for Database Log Files

- by user12620111

I've been troubled by drop outs in CPU usage in my application server, characterized by the CPUs suddenly going from close to 90% CPU busy to almost completely CPU idle for a few seconds. Here is an example of a drop out as shown by a snippet of vmstat data taken while the application server is under a heavy workload. # vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s3 s4 s5 s6 in sy cs us sy id 1 0 0 130160176 116381952 0 16 0 0 0 0 0 0 0 0 0 207377 117715 203884 70 21 9 12 0 0 130160160 116381936 0 25 0 0 0 0 0 0 0 0 0 200413 117162 197250 70 20 9 11 0 0 130160176 116381920 0 16 0 0 0 0 0 0 1 0 0 203150 119365 200249 72 21 7 8 0 0 130160176 116377808 0 19 0 0 0 0 0 0 0 0 0 169826 96144 165194 56 17 27 0 0 0 130160176 116377800 0 16 0 0 0 0 0 0 0 0 1 10245 9376 9164 2 1 97 0 0 0 130160176 116377792 0 16 0 0 0 0 0 0 0 0 2 15742 12401 14784 4 1 95 0 0 0 130160176 116377776 2 16 0 0 0 0 0 0 1 0 0 19972 17703 19612 6 2 92 14 0 0 130160176 116377696 0 16 0 0 0 0 0 0 0 0 0 202794 116793 199807 71 21 8 9 0 0 130160160 116373584 0 30 0 0 0 0 0 0 18 0 0 203123 117857 198825 69 20 11 This behavior occurred consistently while the application server was processing synthetic transactions: HTTP requests from JMeter running on an external machine. I explored many theories trying to explain the drop outs, including: Unexpected JMeter behavior Network contention Java Garbage Collection Application Server thread pool problems Connection pool problems Database transaction processing Database I/O contention Graphing the CPU %idle led to a breakthrough: Several of the drop outs were 30 seconds apart. With that insight, I went digging through the data again and looking for other outliers that were 30 seconds apart. In the database server statistics, I found spikes in the iostat "asvc_t" (average response time of disk transactions, in milliseconds) for the disk drive that was being used for the database log files. Here is an example: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 2053.6 0.0 8234.3 0.0 0.2 0.0 0.1 0 24 c3t60080E5...F4F6d0s0 0.0 2162.2 0.0 8652.8 0.0 0.3 0.0 0.1 0 28 c3t60080E5...F4F6d0s0 0.0 1102.5 0.0 10012.8 0.0 4.5 0.0 4.1 0 69 c3t60080E5...F4F6d0s0 0.0 74.0 0.0 7920.6 0.0 10.0 0.0 135.1 0 100 c3t60080E5...F4F6d0s0 0.0 568.7 0.0 6674.0 0.0 6.4 0.0 11.2 0 90 c3t60080E5...F4F6d0s0 0.0 1358.0 0.0 5456.0 0.0 0.6 0.0 0.4 0 55 c3t60080E5...F4F6d0s0 0.0 1314.3 0.0 5285.2 0.0 0.7 0.0 0.5 0 70 c3t60080E5...F4F6d0s0 Here is a little more information about my database configuration: The database and application server were running on two different SPARC servers. Storage for the database was on a storage array connected via 8 gigabit Fibre Channel Data storage and log file were on different physical disk drives Reliable low latency I/O is provided by battery backed NVRAM Highly available: Two Fibre Channel links accessed via MPxIO Two Mirrored cache controllers The log file physical disks were mirrored in the storage device Database log files on a ZFS Filesystem with cutting-edge technologies, such as copy-on-write and end-to-end checksumming Why would I be getting service time spikes in my high-end storage? First, I wanted to verify that the database log disk service time spikes aligned with the application server CPU drop outs, and they did: At first, I guessed that the disk service time spikes might be related to flushing the write through cache on the storage device, but I was unable to validate that theory. After searching the WWW for a while, I decided to try using a separate log device: # zpool add ZFS-db-41 log c3t60080E500017D55C000015C150A9F8A7d0 The ZFS log device is configured in a similar manner as described above: two physical disks mirrored in the storage array. This change to the database storage configuration eliminated the application server CPU drop outs: Here is the zpool configuration: # zpool status ZFS-db-41 pool: ZFS-db-41 state: ONLINE scan: none requested config: NAME STATE ZFS-db-41 ONLINE c3t60080E5...F4F6d0 ONLINE logs c3t60080E5...F8A7d0 ONLINE Now, the I/O spikes look like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1053.5 0.0 4234.1 0.0 0.8 0.0 0.7 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1131.8 0.0 4555.3 0.0 0.8 0.0 0.7 0 76 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1167.6 0.0 4682.2 0.0 0.7 0.0 0.6 0 74 c3t60080E5...F8A7d0s0 0.0 162.2 0.0 19153.9 0.0 0.7 0.0 4.2 0 12 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1247.2 0.0 4992.6 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 0.0 41.0 0.0 70.0 0.0 0.1 0.0 1.6 0 2 c3t60080E5...F4F6d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1241.3 0.0 4989.3 0.0 0.8 0.0 0.6 0 75 c3t60080E5...F8A7d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1193.2 0.0 4772.9 0.0 0.7 0.0 0.6 0 71 c3t60080E5...F8A7d0s0 We can see the steady flow of 4k writes to the ZIL device from O_SYNC database log file writes. The spikes are from flushing the transaction group. Like almost all problems that I run into, once I thoroughly understand the problem, I find that other people have documented similar experiences. Thanks to all of you who have documented alternative approaches. Saved for another day: now that the problem is obvious, I should try "zfs:zfs_immediate_write_sz" as recommended in the ZFS Evil Tuning Guide. References: The ZFS Intent Log Solaris ZFS, Synchronous Writes and the ZIL Explained ZFS Evil Tuning Guide: Cache Flushes ZFS Evil Tuning Guide: Tuning ZFS for Database Performance

Read the article

SQL SERVER Disabled Index and UpdateStatistics

When we try to update the statistics, it throws an error as if the clustered index is disabled. Now let us enable the clustered index only and attempt to update the statistics of the table right after that. Have you ever come across the situation where a conversation never gets over and it continues even [...]...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article

How do you monitor SSD wear in Windows when the drives are presented as 'generic' devices?

- by MikeyB

Under Linux, we can monitor SSD wear fairly easily with smartmontools whether the drive is presented as a normal block device or a generic device (which happens when the drive has been hardware RAIDed by certain controllers such as the one on the IBM HS22). How can we do the equivalent under Windows? Does anyone actually use smartmontools? Or are there other packages out there? The problem is that SCSI Generic devices just don't show up in Windows. If the drives aren't RAIDed we can see them fine. How I'd do it in Linux: sles11-live:~ # lsscsi -g [1:0:0:0] disk SMART USB-IBM 8989 /dev/sda /dev/sg0 [2:0:0:0] disk ATA MTFDDAK256MAR-1K MA44 - /dev/sg1 [2:0:1:0] disk ATA MTFDDAK256MAR-1K MA44 - /dev/sg2 [2:1:8:0] disk LSILOGIC Logical Volume 3000 /dev/sdb /dev/sg3 sles11-live:~ # smartctl -l ssd /dev/sg1 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32.49-0.3-default] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Device Statistics (GP Log 0x04) Page Offset Size Value Description 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 26~ Percentage Used Endurance Indicator |_ ~ normalized value sles11-live:~ # smartctl -l ssd /dev/sg2 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32.49-0.3-default] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Device Statistics (GP Log 0x04) Page Offset Size Value Description 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 3~ Percentage Used Endurance Indicator |_ ~ normalized value

Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 18/66 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by DavidM

- by ale

- by Howiecamp

- by Jake

- by Sachin Chourasiya

- by Sachin Chourasiya

- by Paul White

- by pinaldave

- by michael

- by rhossi

- by pinaldave

- by pinaldave

- by Tamarick Hill

- by Justin984

- by Liu Maclean(???)

- by Tara Kizer

- by JoshReuben

- by Maria Colgan

- by D'Arcy Lussier

- by Nicholas

- by hoberion

- by Anthony Shorten

- by user12620111

- by MikeyB

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >