Search Results

Search found 2150 results on 86 pages for 'jean paul calderone'.

Page 9/86 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

Display all feeds using simplexml_load() using PHP

- by Jean

Hello, I want to loop to get all feeds, but displaying only one $url = "http://localhost/feeds/feeds.rss"; $xml = simplexml_load_file($url); foreach($xml->item as $result){ echo $result->description."<br>"; } RSS Feed is - <channel> <title>/</title> <link>/</link> <atom:link type="application/rss+xml" href="/" rel="self"/> <description>/</description> <language>/</language> <ttl>/</ttl> <item> <title>/</title> <description>/</description> <pubDate>/</pubDate> <guid>/</guid> <link>/</link> </item> <item> <title>/</title> <description>/</description> <pubDate>/</pubDate> <guid>/</guid> <link>/</link> </item> </channel> Thanks Jean

Read the article
get id parameter and increment count for each click in jquery

- by Jean

Hello, I want to get the id value of c_i. And each time c_i is clicked I want to increment the value at chk_count, using jquery <div id='d<? echo $i; ?>' style='margin-bottom:8px; border:#cccccc thin solid; height:25px;'> <span style='color:#cccccc; margin-right:5px;'><? echo $i; ?></span> <span><? echo $row_dw_all['d1e']; ?></span> <span style='position:absolute;right:0px;'> <input type='checkbox' name='c_i<? echo $i; ?>' id="c_i<? echo $i; ?>" value='<? echo $row_dw_all['dle']; ?>'> </span> </div> <input type="hidden" name="chk_count" id="chk_count" value="" /> Thanks Jean

Read the article
Oracle Communications Data Model

- by jean-pierre.dijcks

I've mentioned OCDM in previous posts but found the following (see end of the post) podcast on the topic and figured it is worthwhile to spread the news some more. ORetailDM and OCommunicationsDM are the two data models currently available from Oracle. Both are intended to capture: Business best practices and industry knowledge Pre-built advanced analytics intended to predict future events before they happen (like the Churn model shown below) Oracle technology best practices to ensure optimal performance of the model All of this typically comes with a reduced time to implementation, or as the marketing slogan goes, reduced time to value. Here are the links: Podcast on OCDM OTN pages for OCDM and ORDM

Read the article
Heaps of Trouble?

- by Paul White NZ

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material. In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all. The problem is, predictably, that performance is not very good. The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found. Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables. A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings). The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan. The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate. Every join in the plan shown above estimates that it will produce just a single row as output. Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows. It should not surprise you that this has rather dire consequences for the remainder of the query plan. In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each. The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints. A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query. The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement! As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there. (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B). Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear. The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools. If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on. Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID. The term ‘eager’ means that the spool consumes all of its input rows when it starts up. The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index. From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table. The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself. The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation. Nested loops join preserves the order of rows on its outer input, so sorting early is safe. (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input. It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation. It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins. You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins. The answer, again, is down to the poor information available to the optimizer. Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs. SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk. The join process then continues, and may again run out of memory. This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow. The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion. You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this). Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it? We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs. The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk. Even so, the query runs to completion in three or four seconds. That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information. We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world. It’s there primarily for its educational and entertainment value. I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details. My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
Collaborate10 – THEconference

- by jean-pierre.dijcks

After spending a few days in Mandalay Bay's THEHotel, I guess I now call everything THE... Seriously, they even tag their toilet paper with THEtp... I guess the brand builders in Vegas thought that once you are on to something you keep on doing it, and granted it is a nice hotel with nice rooms. THEanalytics Most of my collab10 experience was in a room called Reef C, where the BIWA bootcamp was held. Two solid days of BI, Warehousing and Analytics organized by the BIWA SIG at IOUG. Didn't get to see all sessions, but what struck me was the high interest in Analytics. Marty Gubar's OLAP session was full and he did some very nice things with the OLAP option. The cool bit was that he actually gets all the advanced calculations in OLAP to show up in OBI EE without any effort. It was nice to see that the idea from OWB where you generate an RPD is now also in AWM. I think it makes life so much simpler to generate these RPD's from your data model. Even if the end RPD needs some tweaking, it is all a lot less effort to get something going. You can see this stuff for yourself in this demo (click here). OBI EE uses just SQL to get to the calculations, and so, if you prefer APEX, you can build you application there and get the same nice calculations in an APEX application. Marty also showed the Simba MDX driver used with Excel. I guess we should call that THEcoolone... and it is very slick and wonderfully useful for all of you who actually know Excel. The nice thing is that you leverage pure Excel for all operations (no plug-ins). That means no new tools to learn, no new controls, all just pure Excel. THEdatabasemachine Got some very good questions in my "what makes Exadata fast" session and overall, the interest in Exadata is overwhelming. One of the things that I did try to do in my session is to get people to think in new patterns rather than in patterns based on Oracle 9i running on some random hardware configuration. We talked a little bit about the often over-indexing and how everyone has to unlearn all of that on Exadata. The main thing however is that everyone needs to get used to the shear size of some of the components in a Database machine V2. 5TB of flash cache is a lot of very fast data storage, half a TB of memory gets quite interesting as well. So what I did there was really focus on some of the content in these earlier posts on Upward ILM and In-Memory processing. In short, I do believe the these newer media point out a trend. In-memory and other fast media will get cheaper and will see more use. Some of that we do automatically by adding new functionality, but in some cases I think the end user of the system needs to start thinking about how to leverage all this new hardware. I think most people got very excited about these new capabilities and opportunities. THEcoolkids One of the cool things about the BIWA track was the hand-on track. Very cool to see big crowds for both OLAP and OWB hands-on. Also quite nice to see that the folks at RittmanMead spent so much time on preparing for that session. While all of them put down cool stuff, none was more cool that seeing Data Mining on an Apple iPAD... it all just looks great on an iPAD! Very disappointing to see that Mark Rittman still wasn't showing OWB on his iPAD ;-) THEend All in all this was a great set of sessions in the BIWA track. Lots of value to our guests (we hope) and we hope they all come again next year!

Read the article
Big Data’s Killer App…

- by jean-pierre.dijcks

Recently Keith spent some time talking about the cloud on this blog and I will spare you my thoughts on the whole thing. What I do want to write down is something about the Big Data movement and what I think is the killer app for Big Data... Where is this coming from, ok, I confess... I spent 3 days in cloud land at the Cloud Connect conference in Santa Clara and it was quite a lot of fun. One of the nice things at Cloud Connect was that there was a track dedicated to Big Data, which prompted me to some extend to write this post. What is Big Data anyways? The most valuable point made in the Big Data track was that Big Data in itself is not very cool. Doing something with Big Data is what makes all of this cool and interesting to a business user! The other good insight I got was that a lot of people think Big Data means a single gigantic monolithic system holding gazillions of bytes or documents or log files. Well turns out that most people in the Big Data track are talking about a lot of collections of smaller data sets. So rather than thinking "big = monolithic" you should be thinking "big = many data sets". This is more than just theoretical, it is actually relevant when thinking about big data and how to process it. It is important because it means that the platform that stores data will most likely consist out of multiple solutions. You may be storing logs on something like HDFS, you may store your customer information in Oracle and you may store distilled clickstream information in some distilled form in MySQL. The big question you will need to solve is not what lives where, but how to get it all together and get some value out of all that data. NoSQL and MapReduce Nope, sorry, this is not the killer app... and no I'm not saying this because my business card says Oracle and I'm therefore biased. I think language is important, but as with storage I think pragmatic is better. In other words, some questions can be answered with SQL very efficiently, others can be answered with PERL or TCL others with MR. History should teach us that anyone trying to solve a problem will use any and all tools around. For example, most data warehouses (Big Data 1.0?) get a lot of data in flat files. Everyone then runs a bunch of shell scripts to massage or verify those files and then shoves those files into the database. We've even built shell script support into external tables to allow for this. I think the Big Data projects will do the same. Some people will use MapReduce, although I would argue that things like Cascading are more interesting, some people will use Java. Some data is stored on HDFS making Cascading the way to go, some data is stored in Oracle and SQL does do a good job there. As with storage and with history, be pragmatic and use what fits and neither NoSQL nor MR will be the one and only. Also, a language, while important, does in itself not deliver business value. So while cool it is not a killer app... Vertical Behavioral Analytics This is the killer app! And you are now thinking: "what does that mean?" Let's decompose that heading. First of all, analytics. I would think you had guessed by now that this is really what I'm after, and of course you are right. But not just analytics, which has a very large scope and means many things to many people. I'm not just after Business Intelligence (analytics 1.0?) or data mining (analytics 2.0?) but I'm after something more interesting that you can only do after collecting large volumes of specific data. That all important data is about behavior. What do my customers do? More importantly why do they behave like that? If you can figure that out, you can tailor web sites, stores, products etc. to that behavior and figure out how to be successful. Today's behavior that is somewhat easily tracked is web site clicks, search patterns and all of those things that a web site or web server tracks. that is where the Big Data lives and where these patters are now emerging. Other examples however are emerging, and one of the examples used at the conference was about prediction churn for a telco based on the social network its members are a part of. That social network is not about LinkedIn or Facebook, but about who calls whom. I call you a lot, you switch provider, and I might/will switch too. And that just naturally brings me to the next word, vertical. Vertical in this context means per industry, e.g. communications or retail or government or any other vertical. The reason for being more specific than just behavioral analytics is that each industry has its own data sources, has its own quirky logic and has its own demands and priorities. Of course, the methods and some of the software will be common and some will have both retail and service industry analytics in place (your corner coffee store for example). But the gist of it all is that analytics that can predict customer behavior for a specific focused group of people in a specific industry is what makes Big Data interesting. Building a Vertical Behavioral Analysis System Well, that is going to be interesting. I have not seen much going on in that space and if I had to have some criticism on the cloud connect conference it would be the lack of concrete user cases on big data. The telco example, while a step into the vertical behavioral part is not really on big data. It used a sample of data from the customers' data warehouse. One thing I do think, and this is where I think parts of the NoSQL stuff come from, is that we will be doing this analysis where the data is. Over the past 10 years we at Oracle have called this in-database analytics. I guess we were (too) early? Now the entire market is going there including companies like SAS. In-place btw does not mean "no data movement at all", what it means that you will do this on data's permanent home. For SAS that is kind of the current problem. Most of the inputs live in a data warehouse. So why move it into SAS and back? That all worked with 1 TB data warehouses, but when we are looking at 100TB to 500 TB of distilled data... Comments? As it is still early days with these systems, I'm very interested in seeing reactions and thoughts to some of these thoughts...

Read the article
Limiting DOPs – Who rules over whom?

- by jean-pierre.dijcks

I've gotten a couple of questions from Dan Morgan and figured I start to answer them in this way. While Dan is running on a big system he is running with Database Resource Manager and he is trying to make sure the system doesn't go crazy (remember end user are never, ever crazy!) on very high DOPs. Q: How do I control statements with very high DOPs driven from user hints in queries? A: The best way to do this is to work with DBRM and impose limits on consumer groups. The Max DOP setting you can set in DBRM allows you to overwrite the hint. Now let's go into some more detail here. Assume my object (and for simplicity we assume there is a single object - and do remember that we always pick the highest DOP when in doubt and when conflicting DOPs are available in a query) has PARALLEL 64 as its setting. Assume that the query that selects something cool from that table lives in a consumer group with a max DOP of 32. Assume no goofy things (like running out of parallel_max_servers) are happening. A query selecting from this table will run at DOP 32 because DBRM caps the DOP. As of 11.2.0.1 we also use the DBRM cap to create the original plan (at compile time) and not just enforce the cap at runtime. Now, my user is smart and writes a query with a parallel hint requesting DOP 128. This query is still capped by DBRM and DBRM overrules the hint in the statement. The statement, despite the hint, runs at DOP 32. Note that in the hinted scenario we do compile the statement with DOP 128 (the optimizer obeys the hint). This is another reason to use table decoration rather than hints. Q: What happens if I set parallel_max_servers higher than processes (e.g. the max number of processes allowed to run on my machine)? A: Processes rules. It is important to understand that processes are fixed at startup time. If you increase parallel_max_servers above the number of processes in the processes parameter you should get a warning in the alert log stating it can not take effect. As a follow up, a hinted query requesting more parallel processes than either parallel_max_servers or processes will not be able to acquire the requested number. Parallel_max_processes will prevent this. And since parallel_max_servers should be lower than max processes you can never go over either...

Read the article
So…is it a Seek or a Scan?

- by Paul White

You’re probably most familiar with the terms ‘Seek’ and ‘Scan’ from the graphical plans produced by SQL Server Management Studio (SSMS). The image to the left shows the most common ones, with the three types of scan at the top, followed by four types of seek. You might look to the SSMS tool-tip descriptions to explain the differences between them: Not hugely helpful are they? Both mention scans and ranges (nothing about seeks) and the Index Seek description implies that it will not scan the index entirely (which isn’t necessarily true). Recall also yesterday’s post where we saw two Clustered Index Seek operations doing very different things. The first Seek performed 63 single-row seeking operations; and the second performed a ‘Range Scan’ (more on those later in this post). I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks? I have often wondered about that, and the first person to mention it after yesterday’s post was Erin Stellato (twitter | blog): Before we go on to make sense of all this, let’s look at another example of how SQL Server confusingly mixes the terms ‘Scan’ and ‘Seek’ in different contexts. The diagram below shows a very simple heap table with two columns, one of which is the non-clustered Primary Key, and the other has a non-unique non-clustered index defined on it. The right hand side of the diagram shows a simple query, it’s associated query plan, and a couple of extracts from the SSMS tool-tip and Properties windows. Notice the ‘scan direction’ entry in the Properties window snippet. Is this a seek or a scan? The different references to Scans and Seeks are even more pronounced in the XML plan output that the graphical plan is based on. This fragment is what lies behind the single Index Seek icon shown above: You’ll find the same confusing references to Seeks and Scans throughout the product and its documentation. Making Sense of Seeks Let’s forget all about scans for a moment, and think purely about seeks. Loosely speaking, a seek is the process of navigating an index B-tree to find a particular index record, most often at the leaf level. A seek starts at the root and navigates down through the levels of the index to find the point of interest: Singleton Lookups The simplest sort of seek predicate performs this traversal to find (at most) a single record. This is the case when we search for a single value using a unique index and an equality predicate. It should be readily apparent that this type of search will either find one record, or none at all. This operation is known as a singleton lookup. Given the example table from before, the following query is an example of a singleton lookup seek: Sadly, there’s nothing in the graphical plan or XML output to show that this is a singleton lookup – you have to infer it from the fact that this is a single-value equality seek on a unique index. The other common examples of a singleton lookup are bookmark lookups – both the RID and Key Lookup forms are singleton lookups (an RID lookup finds a single record in a heap from the unique row locator, and a Key Lookup does much the same thing on a clustered table). If you happen to run your query with STATISTICS IO ON, you will notice that ‘Scan Count’ is always zero for a singleton lookup. Range Scans The other type of seek predicate is a ‘seek plus range scan’, which I will refer to simply as a range scan. The seek operation makes an initial descent into the index structure to find the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in the index) until it reaches the end of the scan range. The ability of a range scan to proceed in either direction comes about because index pages at the same level are connected by a doubly-linked list – each page has a pointer to the previous page (in logical key order) as well as a pointer to the following page. The doubly-linked list is represented by the green and red dotted arrows in the index diagram presented earlier. One subtle (but important) point is that the notion of a ‘forward’ or ‘backward’ scan applies to the logical key order defined when the index was built. In the present case, the non-clustered primary key index was created as follows: CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col ASC) ) ; Notice that the primary key index specifies an ascending sort order for the single key column. This means that a forward scan of the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order. If the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order. A range scan seek predicate may have a Start condition, an End condition, or both. Where one is missing, the scan starts (or ends) at one extreme end of the index, depending on the scan direction. Some examples might help clarify that: the following diagram shows four queries, each of which performs a single seek against a column holding every integer from 1 to 100 inclusive. The results from each query are shown in the blue columns, and relevant attributes from the Properties window appear on the right: Query 1 specifies that all key_col values less than 5 should be returned in ascending order. The query plan achieves this by seeking to the start of the index leaf (there is no explicit starting value) and scanning forward until the End condition (key_col < 5) is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed to sort higher). Query 2 asks for key_col values greater than 95, in descending order. SQL Server returns these results by seeking to the end of the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95. Sharp-eyed readers may notice that the end-of-scan condition is shown as a Start range value. This is a bug in the XML show plan which bubbles up to the Properties window – when a backward scan is performed, the roles of the Start and End values are reversed, but the plan does not reflect that. Oh well. Query 3 looks for key_col values that are greater than or equal to 10, and less than 15, in ascending order. This time, SQL Server seeks to the first index record that matches the Start condition (key_col >= 10) and then scans forward through the leaf pages until the End condition (key_col < 15) is no longer met. Query 4 performs much the same sort of operation as Query 3, but requests the output in descending order. Again, we have to mentally reverse the Start and End conditions because of the bug, but otherwise the process is the same as always: SQL Server finds the highest-sorting record that meets the condition ‘key_col < 25’ and scans backward until ‘key_col >= 20’ is no longer true. One final point to note: seek operations always have the Ordered: True attribute. This means that the operator always produces rows in a sorted order, either ascending or descending depending on how the index was defined, and whether the scan part of the operation is forward or backward. You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of the sort order internally. In the four queries above, the query optimizer was able to avoid an explicit Sort operator to honour the ORDER BY clause, for example. Multiple Seek Predicates As we saw yesterday, a single index seek plan operator can contain one or more seek predicates. These seek predicates can either be all singleton seeks or all range scans – SQL Server does not mix them. For example, you might expect the following query to contain two seek predicates, a singleton seek to find the single record in the unique index where key_col = 10, and a range scan to find the key_col values between 15 and 20: SELECT key_col FROM dbo.Example WHERE key_col = 10 OR key_col BETWEEN 15 AND 20 ORDER BY key_col ASC ; In fact, SQL Server transforms the singleton seek (key_col = 10) to the equivalent range scan, Start:[key_col >= 10], End:[key_col <= 10]. This allows both range scans to be evaluated by a single seek operator. To be clear, this query results in two range scans: one from 10 to 10, and one from 15 to 20. Final Thoughts That’s it for today – tomorrow we’ll look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table. Yes, a seek. On a heap. Not an index! If you would like to run the queries in this post for yourself, there’s a script below. Thanks for reading! IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL BEGIN DROP TABLE dbo.Example; END ; -- Test table is a heap -- Non-clustered primary key on 'key_col' CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ; -- Non-unique non-clustered index on the 'data' column CREATE NONCLUSTERED INDEX [IX dbo.Example data] ON dbo.Example (data) ; -- Add 100 rows INSERT dbo.Example WITH (TABLOCKX) ( key_col, data ) SELECT key_col = V.number, data = V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 100 ; -- ================ -- Singleton lookup -- ================ ; -- Single value equality seek in a unique index -- Scan count = 0 when STATISTIS IO is ON -- Check the XML SHOWPLAN SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 32 ; -- =========== -- Range Scans -- =========== ; -- Query 1 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col <= 5 ORDER BY E.key_col ASC ; -- Query 2 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col > 95 ORDER BY E.key_col DESC ; -- Query 3 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 10 AND E.key_col < 15 ORDER BY E.key_col ASC ; -- Query 4 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 20 AND E.key_col < 25 ORDER BY E.key_col DESC ; -- Final query (singleton + range = 2 range scans) SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 10 OR E.key_col BETWEEN 15 AND 20 ORDER BY E.key_col ASC ; -- === TIDY UP === DROP TABLE dbo.Example; © 2011 Paul White email: [email protected] twitter: @SQL_Kiwi

Read the article
Hello Operator, My Switch Is Bored

- by Paul White

This is a post for T-SQL Tuesday #43 hosted by my good friend Rob Farley. The topic this month is Plan Operators. I haven’t taken part in T-SQL Tuesday before, but I do like to write about execution plans, so this seemed like a good time to start. This post is in two parts. The first part is primarily an excuse to use a pretty bad play on words in the title of this blog post (if you’re too young to know what a telephone operator or a switchboard is, I hate you). The second part of the post looks at an invisible query plan operator (so to speak). 1. My Switch Is Bored Allow me to present the rare and interesting execution plan operator, Switch: Books Online has this to say about Switch: Following that description, I had a go at producing a Fast Forward Cursor plan that used the TOP operator, but had no luck. That may be due to my lack of skill with cursors, I’m not too sure. The only application of Switch in SQL Server 2012 that I am familiar with requires a local partitioned view: CREATE TABLE dbo.T1 (c1 int NOT NULL CHECK (c1 BETWEEN 00 AND 24)); CREATE TABLE dbo.T2 (c1 int NOT NULL CHECK (c1 BETWEEN 25 AND 49)); CREATE TABLE dbo.T3 (c1 int NOT NULL CHECK (c1 BETWEEN 50 AND 74)); CREATE TABLE dbo.T4 (c1 int NOT NULL CHECK (c1 BETWEEN 75 AND 99)); GO CREATE VIEW V1 AS SELECT c1 FROM dbo.T1 UNION ALL SELECT c1 FROM dbo.T2 UNION ALL SELECT c1 FROM dbo.T3 UNION ALL SELECT c1 FROM dbo.T4; Not only that, but it needs an updatable local partitioned view. We’ll need some primary keys to meet that requirement: ALTER TABLE dbo.T1 ADD CONSTRAINT PK_T1 PRIMARY KEY (c1); ALTER TABLE dbo.T2 ADD CONSTRAINT PK_T2 PRIMARY KEY (c1); ALTER TABLE dbo.T3 ADD CONSTRAINT PK_T3 PRIMARY KEY (c1); ALTER TABLE dbo.T4 ADD CONSTRAINT PK_T4 PRIMARY KEY (c1); We also need an INSERT statement that references the view. Even more specifically, to see a Switch operator, we need to perform a single-row insert (multi-row inserts use a different plan shape): INSERT dbo.V1 (c1) VALUES (1); And now…the execution plan: The Constant Scan manufactures a single row with no columns. The Compute Scalar works out which partition of the view the new value should go in. The Assert checks that the computed partition number is not null (if it is, an error is returned). The Nested Loops Join executes exactly once, with the partition id as an outer reference (correlated parameter). The Switch operator checks the value of the parameter and executes the corresponding input only. If the partition id is 0, the uppermost Clustered Index Insert is executed, adding a row to table T1. If the partition id is 1, the next lower Clustered Index Insert is executed, adding a row to table T2…and so on. In case you were wondering, here’s a query and execution plan for a multi-row insert to the view: INSERT dbo.V1 (c1) VALUES (1), (2); Yuck! An Eager Table Spool and four Filters! I prefer the Switch plan. My guess is that almost all the old strategies that used a Switch operator have been replaced over time, using things like a regular Concatenation Union All combined with Start-Up Filters on its inputs. Other new (relative to the Switch operator) features like table partitioning have specific execution plan support that doesn’t need the Switch operator either. This feels like a bit of a shame, but perhaps it is just nostalgia on my part, it’s hard to know. Please do let me know if you encounter a query that can still use the Switch operator in 2012 – it must be very bored if this is the only possible modern usage! 2. Invisible Plan Operators The second part of this post uses an example based on a question Dave Ballantyne asked using the SQL Sentry Plan Explorer plan upload facility. If you haven’t tried that yet, make sure you’re on the latest version of the (free) Plan Explorer software, and then click the Post to SQLPerformance.com button. That will create a site question with the query plan attached (which can be anonymized if the plan contains sensitive information). Aaron Bertrand and I keep a close eye on questions there, so if you have ever wanted to ask a query plan question of either of us, that’s a good way to do it. The problem The issue I want to talk about revolves around a query issued against a calendar table. The script below creates a simplified version and adds 100 years of per-day information to it: USE tempdb; GO CREATE TABLE dbo.Calendar ( dt date NOT NULL, isWeekday bit NOT NULL, theYear smallint NOT NULL, CONSTRAINT PK__dbo_Calendar_dt PRIMARY KEY CLUSTERED (dt) ); GO -- Monday is the first day of the week for me SET DATEFIRST 1; -- Add 100 years of data INSERT dbo.Calendar WITH (TABLOCKX) (dt, isWeekday, theYear) SELECT CA.dt, isWeekday = CASE WHEN DATEPART(WEEKDAY, CA.dt) IN (6, 7) THEN 0 ELSE 1 END, theYear = YEAR(CA.dt) FROM Sandpit.dbo.Numbers AS N CROSS APPLY ( VALUES (DATEADD(DAY, N.n - 1, CONVERT(date, '01 Jan 2000', 113))) ) AS CA (dt) WHERE N.n BETWEEN 1 AND 36525; The following query counts the number of weekend days in 2013: SELECT Days = COUNT_BIG(*) FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; It returns the correct result (104) using the following execution plan: The query optimizer has managed to estimate the number of rows returned from the table exactly, based purely on the default statistics created separately on the two columns referenced in the query’s WHERE clause. (Well, almost exactly, the unrounded estimate is 104.289 rows.) There is already an invisible operator in this query plan – a Filter operator used to apply the WHERE clause predicates. We can see it by re-running the query with the enormously useful (but undocumented) trace flag 9130 enabled: Now we can see the full picture. The whole table is scanned, returning all 36,525 rows, before the Filter narrows that down to just the 104 we want. Without the trace flag, the Filter is incorporated in the Clustered Index Scan as a residual predicate. It is a little bit more efficient than using a separate operator, but residual predicates are still something you will want to avoid where possible. The estimates are still spot on though: Anyway, looking to improve the performance of this query, Dave added the following filtered index to the Calendar table: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear) WHERE isWeekday = 0; The original query now produces a much more efficient plan: Unfortunately, the estimated number of rows produced by the seek is now wrong (365 instead of 104): What’s going on? The estimate was spot on before we added the index! Explanation You might want to grab a coffee for this bit. Using another trace flag or two (8606 and 8612) we can see that the cardinality estimates were exactly right initially: The highlighted information shows the initial cardinality estimates for the base table (36,525 rows), the result of applying the two relational selects in our WHERE clause (104 rows), and after performing the COUNT_BIG(*) group by aggregate (1 row). All of these are correct, but that was before cost-based optimization got involved :) Cost-based optimization When cost-based optimization starts up, the logical tree above is copied into a structure (the ‘memo’) that has one group per logical operation (roughly speaking). The logical read of the base table (LogOp_Get) ends up in group 7; the two predicates (LogOp_Select) end up in group 8 (with the details of the selections in subgroups 0-6). These two groups still have the correct cardinalities as trace flag 8608 output (initial memo contents) shows: During cost-based optimization, a rule called SelToIdxStrategy runs on group 8. It’s job is to match logical selections to indexable expressions (SARGs). It successfully matches the selections (theYear = 2013, is Weekday = 0) to the filtered index, and writes a new alternative into the memo structure. The new alternative is entered into group 8 as option 1 (option 0 was the original LogOp_Select): The new alternative is to do nothing (PhyOp_NOP = no operation), but to instead follow the new logical instructions listed below the NOP. The LogOp_GetIdx (full read of an index) goes into group 21, and the LogOp_SelectIdx (selection on an index) is placed in group 22, operating on the result of group 21. The definition of the comparison ‘the Year = 2013’ (ScaOp_Comp downwards) was already present in the memo starting at group 2, so no new memo groups are created for that. New Cardinality Estimates The new memo groups require two new cardinality estimates to be derived. First, LogOp_Idx (full read of the index) gets a predicted cardinality of 10,436. This number comes from the filtered index statistics: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH STAT_HEADER; The second new cardinality derivation is for the LogOp_SelectIdx applying the predicate (theYear = 2013). To get a number for this, the cardinality estimator uses statistics for the column ‘theYear’, producing an estimate of 365 rows (there are 365 days in 2013!): DBCC SHOW_STATISTICS (Calendar, theYear) WITH HISTOGRAM; This is where the mistake happens. Cardinality estimation should have used the filtered index statistics here, to get an estimate of 104 rows: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH HISTOGRAM; Unfortunately, the logic has lost sight of the link between the read of the filtered index (LogOp_GetIdx) in group 22, and the selection on that index (LogOp_SelectIdx) that it is deriving a cardinality estimate for, in group 21. The correct cardinality estimate (104 rows) is still present in the memo, attached to group 8, but that group now has a PhyOp_NOP implementation. Skipping over the rest of cost-based optimization (in a belated attempt at brevity) we can see the optimizer’s final output using trace flag 8607: This output shows the (incorrect, but understandable) 365 row estimate for the index range operation, and the correct 104 estimate still attached to its PhyOp_NOP. This tree still has to go through a few post-optimizer rewrites and ‘copy out’ from the memo structure into a tree suitable for the execution engine. One step in this process removes PhyOp_NOP, discarding its 104-row cardinality estimate as it does so. To finish this section on a more positive note, consider what happens if we add an OVER clause to the query aggregate. This isn’t intended to be a ‘fix’ of any sort, I just want to show you that the 104 estimate can survive and be used if later cardinality estimation needs it: SELECT Days = COUNT_BIG(*) OVER () FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; The estimated execution plan is: Note the 365 estimate at the Index Seek, but the 104 lives again at the Segment! We can imagine the lost predicate ‘isWeekday = 0’ as sitting between the seek and the segment in an invisible Filter operator that drops the estimate from 365 to 104. Even though the NOP group is removed after optimization (so we don’t see it in the execution plan) bear in mind that all cost-based choices were made with the 104-row memo group present, so although things look a bit odd, it shouldn’t affect the optimizer’s plan selection. I should also mention that we can work around the estimation issue by including the index’s filtering columns in the index key: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear, isWeekday) WHERE isWeekday = 0 WITH (DROP_EXISTING = ON); There are some downsides to doing this, including that changes to the isWeekday column may now require Halloween Protection, but that is unlikely to be a big problem for a static calendar table ;) With the updated index in place, the original query produces an execution plan with the correct cardinality estimation showing at the Index Seek: That’s all for today, remember to let me know about any Switch plans you come across on a modern instance of SQL Server! Finally, here are some other posts of mine that cover other plan operators: Segment and Sequence Project Common Subexpression Spools Why Plan Operators Run Backwards Row Goals and the Top Operator Hash Match Flow Distinct Top N Sort Index Spools and Page Splits Singleton and Range Seeks Bitmaps Hash Join Performance Compute Scalar © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
"Software sources" crashes since unbuntu updated from 12.04 to 12.10

- by Jean-Sebastien

First of all, sorry for my English, I is not my native language. I recently updated my PC from ubuntu 12.04 to 12.10. Now “Software sources” crashes when I try to open it directly from Unity or from Ubuntu Software Center ? Software sources. When I try to open “Update manager”, I get the following error message. Note that the internet connection WROKS! W:Failed to fetch http://ppa.launchpad.net/rye/ubuntuone-extras/ubuntu/dists/quantal/main/source/Sources 404 Not Found, ... ... ... E:Some index files failed to download. They have been ignored, or old ones used instead. Please, can somebody help me on this? JS

Read the article
Partition Wise Joins

- by jean-pierre.dijcks

Some say they are the holy grail of parallel computing and PWJ is the basis for a shared nothing system and the only join method that is available on a shared nothing system (yes this is oversimplified!). The magic in Oracle is of course that is one of many ways to join data. And yes, this is the old flexibility vs. simplicity discussion all over, so I won't go there... the point is that what you must do in a shared nothing system, you can do in Oracle with the same speed and methods. The Theory A partition wise join is a join between (for simplicity) two tables that are partitioned on the same column with the same partitioning scheme. In shared nothing this is effectively hard partitioning locating data on a specific node / storage combo. In Oracle is is logical partitioning. If you now join the two tables on that partitioned column you can break up the join in smaller joins exactly along the partitions in the data. Since they are partitioned (grouped) into the same buckets, all values required to do the join live in the equivalent bucket on either sides. No need to talk to anyone else, no need to redistribute data to anyone else... in short, the optimal join method for parallel processing of two large data sets. PWJ's in Oracle Since we do not hard partition the data across nodes in Oracle we use the Partitioning option to the database to create the buckets, then set the Degree of Parallelism (or run Auto DOP - see here) and get our PWJs. The main questions always asked are: How many partitions should I create? What should my DOP be? In a shared nothing system the answer is of course, as many partitions as there are nodes which will be your DOP. In Oracle we do want you to look at the workload and concurrency, and once you know that to understand the following rules of thumb. Within Oracle we have more ways of joining of data, so it is important to understand some of the PWJ ideas and what it means if you have an uneven distribution across processes. Assume we have a simple scenario where we partition the data on a hash key resulting in 4 hash partitions (H1 -H4). We have 2 parallel processes that have been tasked with reading these partitions (P1 - P2). The work is evenly divided assuming the partitions are the same size and we can scan this in time t1 as shown below. Now assume that we have changed the system and have a 5th partition but still have our 2 workers P1 and P2. The time it takes is actually 50% more assuming the 5th partition has the same size as the original H1 - H4 partitions. In other words to scan these 5 partitions, the time t2 it takes is not 1/5th more expensive, it is a lot more expensive and some other join plans may now start to look exciting to the optimizer. Just to post the disclaimer, it is not as simple as I state it here, but you get the idea on how much more expensive this plan may now look... Based on this little example there are a few rules of thumb to follow to get the partition wise joins. First, choose a DOP that is a factor of two (2). So always choose something like 2, 4, 8, 16, 32 and so on... Second, choose a number of partitions that is larger or equal to 2* DOP. Third, make sure the number of partitions is divisible through 2 without orphans. This is also known as an even number... Fourth, choose a stable partition count strategy, which is typically hash, which can be a sub partitioning strategy rather than the main strategy (range - hash is a popular one). Fifth, make sure you do this on the join key between the two large tables you want to join (and this should be the obvious one...). Translating this into an example: DOP = 8 (determined based on concurrency or by using Auto DOP with a cap due to concurrency) says that the number of partitions >= 16. Number of hash (sub) partitions = 32, which gives each process four partitions to work on. This number is somewhat arbitrary and depends on your data and system. In this case my main reasoning is that if you get more room on the box you can easily move the DOP for the query to 16 without repartitioning... and of course it makes for no leftovers on the table... And yes, we recommend up-to-date statistics. And before you start complaining, do read this post on a cool way to do stats in 11.

Read the article
Big Data Accelerator

- by Jean-Pierre Dijcks

For everyone who does not regularly listen to earnings calls, Oracle's Q4 call was interesting (as it mostly is). One of the announcements in the call was the Big Data Accelerator from Oracle (Seeking Alpha link here - slightly tweaked for correctness shown below): "The big data accelerator includes some of the standard open source software, HDFS, the file system and a number of other pieces, but also some Oracle components that we think can dramatically speed up the entire map-reduce process. And will be particularly attractive to Java programmers [...]. There are some interesting applications they do, ETL is one. Log processing is another. We're going to have a lot of those features, functions and pre-built applications in our big data accelerator." Not much else we can say right now, more on this (and Big Data in general) at Openworld!

Read the article
Oracle Database Machine and Exadata Storage Server

- by jean-marc.gaudron(at)oracle.com

Master Note for Oracle Database Machine and Exadata Storage Server (Doc ID 1187674.1)This Master Note is intended to provide an index and references to the most frequently used My Oracle Support Notes with respect to Oracle Exadata and Oracle Database Machine environments. This Master Note is subdivided into categories to allow for easy access and reference to notes that are applicable to your area of interest. This includes the following categories: • Database Machine and Exadata Storage Server Concepts and Overview• Database Machine and Exadata Storage Server Configuration and Administration• Database Machine and Exadata Storage Server Troubleshooting and Debugging• Database Machine and Exadata Storage Server Best Practices• Database Machine and Exadata Storage Server Patching• Database Machine and Exadata Storage Server Documentation and References• Database Machine and Exadata Storage Server Known Problems• ASM and RAC Documentation• Using My Oracle Support Effectively

Read the article
Partition Wise Joins II

- by jean-pierre.dijcks

One of the things that I did not talk about in the initial partition wise join post was the effect it has on resource allocation on the database server. When Oracle applies a different join method - e.g. not PWJ - what you will see in SQL Monitor (in Enterprise Manager) or in an Explain Plan is a set of producers and a set of consumers. The producers scan the tables in the the join. If there are two tables the producers first scan one table, then the other. The producers thus provide data to the consumers, and when the consumers have the data from both scans they do the join and give the data to the query coordinator. Now that behavior means that if you choose a degree of parallelism of 4 to run such query with, Oracle will allocate 8 parallel processes. Of these 8 processes 4 are producers and 4 are consumers. The consumers only actually do work once the producers are fully done with scanning both sides of the join. In the plan above you can see that the producers access table SALES [line 11] and then do a PX SEND [line 9]. That is the producer set of processes working. The consumers receive that data [line 8] and twiddle their thumbs while the producers go on and scan CUSTOMERS. The producers send that data to the consumer indicated by PX SEND [line 5]. After receiving that data [line 4] the consumers do the actual join [line 3] and give the data to the QC [line 2]. BTW, the myth that you see twice the number of processes due to the setting PARALLEL_THREADS_PER_CPU=2 is obviously not true. The above is why you will see 2 times the processes of the DOP. In a PWJ plan the consumers are not present. Instead of producing rows and giving those to different processes, a PWJ only uses a single set of processes. Each process reads its piece of the join across the two tables and performs the join. The plan here is notably different from the initial plan. First of all the hash join is done right on top of both table scans [line 8]. This query is a little more complex than the previous so there is a bit of noise above that bit of info, but for this post, lets ignore that (sort stuff). The important piece here is that the PWJ plan typically will be faster and from a PX process number / resources typically cheaper. You may want to look out for those plans and try to get those to appear a lot... CREDITS: credits for the plans and some of the info on the plans go to Maria, as she actually produced these plans and is the expert on plans in general... You can see her talk about explaining the explain plan and other optimizer stuff over here: ODTUG in Washington DC, June 27 - July 1 On the Optimizer blog At OpenWorld in San Francisco, September 19 - 23 Happy joining and hope to see you all at ODTUG and OOW...

Read the article
Auto DOP and Concurrency

- by jean-pierre.dijcks

After spending some time in the cloud, I figured it is time to come down to earth and start discussing some of the new Auto DOP features some more. As Database Machines (the v2 machine runs Oracle Database 11.2) are effectively selling like hotcakes, it makes some sense to talk about the new parallel features in more detail. For basic understanding make sure you have read the initial post. The focus there is on Auto DOP and queuing, which is to some extend the focus here. But now I want to discuss the concurrency a little and explain some of the relevant parameters and their impact, specifically in a situation with concurrency on the system. The goal of Auto DOP The idea behind calculating the Automatic Degree of Parallelism is to find the highest possible DOP (ideal DOP) that still scales. In other words, if we were to increase the DOP even more above a certain DOP we would see a tailing off of the performance curve and the resource cost / performance would become less optimal. Therefore the ideal DOP is the best resource/performance point for that statement. The goal of Queuing On a normal production system we should see statements running concurrently. On a Database Machine we typically see high concurrency rates, so we need to find a way to deal with both high DOP’s and high concurrency. Queuing is intended to make sure we Don’t throttle down a DOP because other statements are running on the system Stay within the physical limits of a system’s processing power Instead of making statements go at a lower DOP we queue them to make sure they will get all the resources they want to run efficiently without trashing the system. The theory – and hopefully – practice is that by giving a statement the optimal DOP the sum of all statements runs faster with queuing than without queuing. Increasing the Number of Potential Parallel Statements To determine how many statements we will consider running in parallel a single parameter should be looked at. That parameter is called PARALLEL_MIN_TIME_THRESHOLD. The default value is set to 10 seconds. So far there is nothing new here…, but do realize that anything serial (e.g. that stays under the threshold) goes straight into processing as is not considered in the rest of this post. Now, if you have a system where you have two groups of queries, serial short running and potentially parallel long running ones, you may want to worry only about the long running ones with this parallel statement threshold. As an example, lets assume the short running stuff runs on average between 1 and 15 seconds in serial (and the business is quite happy with that). The long running stuff is in the realm of 1 – 5 minutes. It might be a good choice to set the threshold to somewhere north of 30 seconds. That way the short running queries all run serial as they do today (if it ain’t broken, don’t fix it) and allows the long running ones to be evaluated for (higher degrees of) parallelism. This makes sense because the longer running ones are (at least in theory) more interesting to unleash a parallel processing model on and the benefits of running these in parallel are much more significant (again, that is mostly the case). Setting a Maximum DOP for a Statement Now that you know how to control how many of your statements are considered to run in parallel, lets talk about the specific degree of any given statement that will be evaluated. As the initial post describes this is controlled by PARALLEL_DEGREE_LIMIT. This parameter controls the degree on the entire cluster and by default it is CPU (meaning it equals Default DOP). For the sake of an example, let’s say our Default DOP is 32. Looking at our 5 minute queries from the previous paragraph, the limit to 32 means that none of the statements that are evaluated for Auto DOP ever runs at more than DOP of 32. Concurrently Running a High DOP A basic assumption about running high DOP statements at high concurrency is that you at some point in time (and this is true on any parallel processing platform!) will run into a resource limitation. And yes, you can then buy more hardware (e.g. expand the Database Machine in Oracle’s case), but that is not the point of this post… The goal is to find a balance between the highest possible DOP for each statement and the number of statements running concurrently, but with an emphasis on running each statement at that highest efficiency DOP. The PARALLEL_SERVER_TARGET parameter is the all important concurrency slider here. Setting this parameter to a higher number means more statements get to run at their maximum parallel degree before queuing kicks in. PARALLEL_SERVER_TARGET is set per instance (so needs to be set to the same value on all 8 nodes in a full rack Database Machine). Just as a side note, this parameter is set in processes, not in DOP, which equates to 4* Default DOP (2 processes for a DOP, default value is 2 * Default DOP, hence a default of 4 * Default DOP). Let’s say we have PARALLEL_SERVER_TARGET set to 128. With our limit set to 32 (the default) we are able to run 4 statements concurrently at the highest DOP possible on this system before we start queuing. If these 4 statements are running, any next statement will be queued. To run a system at high concurrency the PARALLEL_SERVER_TARGET should be raised from its default to be much closer (start with 60% or so) to PARALLEL_MAX_SERVERS. By using both PARALLEL_SERVER_TARGET and PARALLEL_DEGREE_LIMIT you can control easily how many statements run concurrently at good DOPs without excessive queuing. Because each workload is a little different, it makes sense to plan ahead and look at these parameters and set these based on your requirements.

Read the article
Data Warehouse Best Practices

- by jean-pierre.dijcks

In our quest to share our endless wisdom (ahem…) one of the things we figured might be handy is recording some of the best practices for data warehousing. And so we did. And, we did some more… We now have recreated our websites on Oracle Technology Network and have a separate page for best practices, parallelism and other cool topics related to data warehousing. But the main topic of this post is the set of recorded best practices. Here is what is available (and it is a series that ties together but can be read independently), applicable for almost any database version: Partitioning 3NF schema design for a data warehouse Star schema design Data Loading Parallel Execution Optimizer and Stats management The best practices page has a lot of other useful information so have a look here.

Read the article
Assembly load and execute issue

- by Jean Carlos Suárez Marranzini

I'm trying to develop Assembly code allowing me to load and execute(by input of the user) 2 other Assembly .EXE programs. I'm having two problems: -I don't seem to be able to assign the pathname to a valid register(Or maybe incorrect syntax) -I need to be able to execute the other program after the first one (could be either) started its execution. This is what I have so far: mov ax,cs ; moving code segment to data segment mov ds,ax mov ah,1h ; here I read from keyboard int 21h mov dl,al cmp al,'1' ; if 1 jump to LOADRUN1 JE LOADRUN1 popf cmp al,'2' ; if 1 jump to LOADRUN2 JE LOADRUN2 popf LOADRUN1: MOV AH,4BH MOV AL,00 LEA DX,[PROGNAME1] ; Not sure if it works INT 21H LOADRUN2: MOV AH,4BH MOV AL,00 LEA DX,[PROGNAME2] ; Not sure if it works INT 21H ; Here I define the bytes containing the pathnames PROGNAME1 db 'C:\Users\Usuario\NASM\Adding.exe',0 PROGNAME2 db 'C:\Users\Usuario\NASM\Substracting.exe',0 I just don't know how start another program by input in the 'parent' program, after one is already executing. Thanks in advance for your help! Any additional information I'll be more than happy to provide. -I'm using NASM 16 bits, Windows 7 32 bits.

Read the article
Live from ODTUG - Big Data and SQL session #2

- by Jean-Pierre Dijcks

Sitting in Dominic Delmolino's session at ODTUG (KScope 12). If the session count at conferences is any indication then we will see more and more people start to deploy MapReduce in the database. And yes, that would be with SQL and PL/SQL first and foremost. Both Dominic and our own Bryn Llewellyn are doing MapReduce in the database presentations. Since I have seen both, I would advice people to first look through Dominic's session to get a good grasp on what mappers do and what reducers do, then dive into Bryn's for a bunch of PL/SQL example. The thing I like about Dominic's is the last slide (a recursive WITH statement) to do this in SQL... Now I am hoping that next year we will see tools vendors show off how they work with Hadoop and MapReduce (at least talking about the concepts!!).

Read the article
My Take on Hadoop World 2011

- by Jean-Pierre Dijcks

I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

Read the article
Automatic Storage Management (ASM)

- by jean-marc.gaudron(at)oracle.com

Master Note for Automatic Storage Management (ASM) (Doc ID 1187723.1)This Master Note is intended to provide an index and references to the most frequently used My Oracle Support Notes with respect to Oracle Automatic Storage Management (ASM) environments. This Master Note is subdivided into categories to allow for easy access and reference to notes that are applicable to your area of interest. This includes the following categories: Automatic Storage Management (ASM) Concepts and Overview Automatic Storage Management (ASM) Installation Automatic Storage Management (ASM) Configuration Automatic Storage Management (ASM) Administration Automatic Storage Management (ASM) Migration and Upgrade Automatic Storage Management (ASM) Monitoring Automatic Storage Management (ASM) Troubleshooting and Debugging Automatic Storage Management (ASM) Best Practices Automatic Storage Management (ASM) Versions and Patches ASMLIB Database Machine, Exadata Storage Server and RAC Documentation Using My Oracle Support Effectively

Read the article
Serial plans: Threshold / Parallel_degree_limit = 1

- by jean-pierre.dijcks

As a very short follow up on the previous post. So here is some more on getting a serial plan and why that happens Another reason - compared to the auto DOP is not on as we looked at in the earlier post - and often more prevalent to get a serial plan is if the plan simply does not take long enough to consider a parallel path. The resulting plan and note looks like this (note that this is a serial plan!): explain plan for select count(1) from sales; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 672559287 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | | | | | | 2 | PARTITION RANGE ALL| | 960 | 5 (0)| 00:00:01 | 1 | 16 | | 3 | TABLE ACCESS FULL | SALES | 960 | 5 (0)| 00:00:01 | 1 | 16 | Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold 14 rows selected. The parallel threshold is referring to parallel_min_time_threshold and since I did not change the default (10s) the plan is not being considered for a parallel degree computation and is therefore staying with the serial execution. Now we go into the land of crazy: Assume I do want this DOP=1 to happen, I could set the parameter in the init.ora, but to highlight it in this case I changed it on the session: alter session set parallel_degree_limit = 1; The result I get is: ERROR: ORA-02097: parameter cannot be modified because specified value is invalid ORA-00096: invalid value 1 for parameter parallel_degree_limit, must be from among CPU IO AUTO INTEGER>=2 Which of course makes perfect sense...

Read the article
DBCC CHECKDB on VVLDB and latches (Or: My Pain is Your Gain)

- by Argenis

Does your CHECKDB hurt, Argenis? There is a classic blog series by Paul Randal [blog|twitter] called “CHECKDB From Every Angle” which is pretty much mandatory reading for anybody who’s even remotely considering going for the MCM certification, or its replacement (the Microsoft Certified Solutions Master: Data Platform – makes my fingers hurt just from typing it). Of particular interest is the post “Consistency Options for a VLDB” – on it, Paul provides solid, timeless advice (I use the word “timeless” because it was written in 2007, and it all applies today!) on how to perform checks on very large databases. Well, here I was trying to figure out how to make CHECKDB run faster on a restored copy of one of our databases, which happens to exceed 7TB in size. The whole thing was taking several days on multiple systems, regardless of the storage used – SAS, SATA or even SSD…and I actually didn’t pay much attention to how long it was taking, or even bothered to look at the reasons why - as long as it was finishing okay and found no consistency errors. Yes – I know. That was a huge mistake, as corruption found in a database several days after taking place could only allow for further spread of the corruption – and potentially large data loss. In the last two weeks I increased my attention towards this problem, as we noticed that CHECKDB was taking EVEN LONGER on brand new all-flash storage in the SAN! I couldn’t really explain it, and were almost ready to blame the storage vendor. The vendor told us that they could initially see the server driving decent I/O – around 450Mb/sec, and then it would settle at a very slow rate of 10Mb/sec or so. “Hum”, I thought – “CHECKDB is just not pushing the I/O subsystem hard enough”. Perfmon confirmed the vendor’s observations. Dreaded @BlobEater What was CHECKDB doing all the time while doing so little I/O? Eating Blobs. It turns out that CHECKDB was taking an extremely long time on one of our frankentables, which happens to be have 35 billion rows (yup, with a b) and sucks up several terabytes of space in the database. We do have a project ongoing to purge/split/partition this table, so it’s just a matter of time before we deal with it. But the reality today is that CHECKDB is coming to a screeching halt in performance when dealing with this particular table. Checking sys.dm_os_waiting_tasks and sys.dm_os_latch_stats showed that LATCH_EX (DBCC_OBJECT_METADATA) was by far the top wait type. I remembered hearing recently about that wait from another post that Paul Randal made, but that was related to computed-column indexes, and in fact, Paul himself reminded me of his article via twitter. But alas, our pathologic table had no non-clustered indexes on computed columns. I knew that latches are used by the database engine to do internal synchronization – but how could I help speed this up? After all, this is stuff that doesn’t have a lot of knobs to tweak. (There’s a fantastic level 500 talk by Bob Ward from Microsoft CSS [blog|twitter] called “Inside SQL Server Latches” given at PASS 2010 – and you can check it out here. DISCLAIMER: I assume no responsibility for any brain melting that might ensue from watching Bob’s talk!) Failed Hypotheses Earlier on this week I flew down to Palo Alto, CA, to visit our Headquarters – and after having a great time with my Monkey peers, I was relaxing on the plane back to Seattle watching a great talk by SQL Server MVP and fellow MCM Maciej Pilecki [twitter] called “Masterclass: A Day in the Life of a Database Transaction” where he discusses many different topics related to transaction management inside SQL Server. Very good stuff, and when I got home it was a little late – that slow DBCC CHECKDB that I had been dealing with was way in the back of my head. As I was looking at the problem at hand earlier on this week, I thought “How about I set the database to read-only?” I remembered one of the things Maciej had (jokingly) said in his talk: “if you don’t want locking and blocking, set the database to read-only” (or something to that effect, pardon my loose memory). I immediately killed the CHECKDB which had been running painfully for days, and set the database to read-only mode. Then I ran DBCC CHECKDB against it. It started going really fast (even a bit faster than before), and then throttled down again to around 10Mb/sec. All sorts of expletives went through my head at the time. Sure enough, the same latching scenario was present. Oh well. I even spent some time trying to figure out if NUMA was hurting performance. Folks on Twitter made suggestions in this regard (thanks, Lonny! [twitter]) …Eureka? This past Friday I was still scratching my head about the whole thing; I was ready to start profiling with XPERF to see if I could figure out which part of the engine was to blame and then get Microsoft to look at the evidence. After getting a bunch of good news I’ll blog about separately, I sat down for a figurative smack down with CHECKDB before the weekend. And then the light bulb went on. A sparse column. I thought that I couldn’t possibly be experiencing the same scenario that Paul blogged about back in March showing extreme latching with non-clustered indexes on computed columns. Did I even have a non-clustered index on my sparse column? As it turns out, I did. I had one filtered non-clustered index – with the sparse column as the index key (and only column). To prove that this was the problem, I went and setup a test. Yup, that'll do it The repro is very simple for this issue: I tested it on the latest public builds of SQL Server 2008 R2 SP2 (CU6) and SQL Server 2012 SP1 (CU4). First, create a test database and a test table, which only needs to contain a sparse column: CREATE DATABASE SparseColTest; GO USE SparseColTest; GO CREATE TABLE testTable (testCol smalldatetime SPARSE NULL); GO INSERT INTO testTable (testCol) VALUES (NULL); GO 1000000 That’s 1 million rows, and even though you’re inserting NULLs, that’s going to take a while. In my laptop, it took 3 minutes and 31 seconds. Next, we run DBCC CHECKDB against the database: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; This runs extremely fast, as least on my test rig – 198 milliseconds. Now let’s create a filtered non-clustered index on the sparse column: CREATE NONCLUSTERED INDEX [badBadIndex] ON testTable (testCol) WHERE testCol IS NOT NULL; With the index in place now, let’s run DBCC CHECKDB one more time: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; In my test system this statement completed in 11433 milliseconds. 11.43 full seconds. Quite the jump from 198 milliseconds. I went ahead and dropped the filtered non-clustered indexes on the restored copy of our production database, and ran CHECKDB against that. We went down from 7+ days to 19 hours and 20 minutes. Cue the “Argenis is not impressed” meme, please, Mr. LaRock. My pain is your gain, folks. Go check to see if you have any of such indexes – they’re likely causing your consistency checks to run very, very slow. Happy CHECKDBing, -Argenis ps: I plan to file a Connect item for this issue – I consider it a pretty serious bug in the engine. After all, filtered indexes were invented BECAUSE of the sparse column feature – and it makes a lot of sense to use them together. Watch this space and my twitter timeline for a link.

Read the article
Explaining Explain Plan Notes for Auto DOP

- by jean-pierre.dijcks

I've recently gotten some questions around "why do I not see a parallel plan" while Auto DOP is on (I think)...? It is probably worthwhile to quickly go over some of the ways to find out what Auto DOP was thinking. In general, there is no need to go tracing sessions and look under the hood. The thing to start with is to do an explain plan on your statement and to look at the parameter settings on the system. Parameter Settings to Look At First and foremost, make sure that parallel_degree_policy = AUTO. If you have that parameter set to LIMITED you will not have queuing and we will only do the auto magic if your objects are set to default parallel (so no degree specified). Next you want to look at the value of parallel_degree_limit. It is typically set to CPU, which in default settings equates to the Default DOP of the system. If you are testing Auto DOP itself and the impact it has on performance you may want to leave it at this CPU setting. If you are running concurrent statements you may want to give this some more thoughts. See here for more information. In general, do stick with either CPU or with a specific number. For now avoid the IO setting as I've seen some mixed results with that... In 11.2.0.2 you should also check that IO Calibrate has been run. Best to simply do a: SQL> select * from V$IO_CALIBRATION_STATUS; STATUS CALIBRATION_TIME ------------- ---------------------------------------------------------------- READY 04-JAN-11 10.04.13.104 AM You should see that your IO Calibrate is READY and therefore Auto DOP is ready. In any case, if you did not run the IO Calibrate step you will get the following note in the explain plan: Note ----- - automatic DOP: skipped because of IO calibrate statistics are missing One more note on calibrate_io, if you do not have asynchronous IO enabled you will see: ERROR at line 1: ORA-56708: Could not find any datafiles with asynchronous i/o capability ORA-06512: at "SYS.DBMS_RMIN", line 463 ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1296 ORA-06512: at line 7 While this is changed in some fixes to the calibrate procedure, you should really consider switching asynchronous IO on for your data warehouse. Explain Plan Explanation To see the notes that are shown and explained here (and the above little snippet ) you can use a simple explain plan mechanism. There should be no need to add +parallel etc. explain plan for <statement> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); Auto DOP The note structure displaying why Auto DOP did not work (with the exception noted above on IO Calibrate) is like this: Automatic degree of parallelism is disabled: <reason> These are the reason codes: Parameter - parallel_degree_policy = manual which will not allow Auto DOP to kick in Hint - One of the following hints are used NOPARALLEL, PARALLEL(1), PARALLEL(MANUAL) Outline - A SQL outline of an older version (before 11.2) is used SQL property restriction - The statement type does not allow for parallel processing Rule-based mode - Instead of the Cost Based Optimizer the system is using the RBO Recursive SQL statement - The statement type does not allow for parallel processing pq disabled/pdml disabled/pddl disabled - For some reason (alter session?) parallelism is disabled Limited mode but no parallel objects referenced - your parallel_degree_policy = LIMITED and no objects in the statement are decorated with the default PARALLEL degree. In most cases all objects have a specific degree in which case Auto DOP will honor that degree. Parallel Degree Limited When Auto DOP does it works you may see the cap you imposed with parallel_degree_limit showing up in the note section of the explain plan: Note ----- - automatic DOP: Computed Degree of Parallelism is 16 because of degree limit This is an obvious indication that your are being capped for this statement. There is one quite interesting one that happens when you are being capped at DOP = 1. First of you get a serial plan and the note changes slightly in that it does not indicate it is being capped (we hope to update the note at some point in time to be more specific). It right now looks like this: Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Dynamic Sampling With 11.2.0.2 you will start seeing another interesting change in parallel plans, and since we are talking about the note section here, I figured we throw this in for good measure. If we deem the parallel (!) statement complex enough, we will enact dynamic sampling on your query. This happens as long as you did not change the default for dynamic sampling on the system. The note looks like this: Note ----- - dynamic sampling used for this statement (level=5)

Read the article
Big Data Appliance X4-2 Release Announcement

- by Jean-Pierre Dijcks

Today we are announcing the release of the 3rd generation Big Data Appliance. Read the Press Release here. Software Focus The focus for this 3rd generation of Big Data Appliance is: Comprehensive and Open - Big Data Appliance now includes all Cloudera Software, including Back-up and Disaster Recovery (BDR), Search, Impala, Navigator as well as the previously included components (like CDH, HBase and Cloudera Manager) and Oracle NoSQL Database (CE or EE). Lower TCO then DIY Hadoop Systems Simplified Operations while providing an open platform for the organization Comprehensive security including the new Audit Vault and Database Firewall software, Apache Sentry and Kerberos configured out-of-the-box Hardware Update A good place to start is to quickly review the hardware differences (no price changes!). On a per node basis the following is a comparison between old and new (X3-2) hardware: Big Data Appliance X3-2 Big Data Appliance X4-2 CPU 2 x 8-Core Intel® Xeon® E5-2660 (2.2 GHz) 2 x 8-Core Intel® Xeon® E5-2650 V2 (2.6 GHz) Memory 64GB 64GB Disk 12 x 3TB High Capacity SAS 12 x 4TB High Capacity SAS InfiniBand 40Gb/sec 40Gb/sec Ethernet 10Gb/sec 10Gb/sec For all the details on the environmentals and other useful information, review the data sheet for Big Data Appliance X4-2. The larger disks give BDA X4-2 33% more capacity over the previous generation while adding faster CPUs. Memory for BDA is expandable to 512 GB per node and can be done on a per-node basis, for example for NameNodes or for HBase region servers, or for NoSQL Database nodes. Software Details More details in terms of software and the current versions (note BDA follows a three monthly update cycle for Cloudera and other software): Big Data Appliance 2.2 Software Stack Big Data Appliance 2.3 Software Stack Linux Oracle Linux 5.8 with UEK 1 Oracle Linux 6.4 with UEK 2 JDK JDK 6 JDK 7 Cloudera CDH CDH 4.3 CDH 4.4 Cloudera Manager CM 4.6 CM 4.7 And like we said at the beginning it is important to understand that all other Cloudera components are now included in the price of Oracle Big Data Appliance. They are fully supported by Oracle and available for all BDA customers. For more information: Big Data Appliance Data Sheet Big Data Connectors Data Sheet Oracle NoSQL Database Data Sheet (CE | EE) Oracle Advanced Analytics Data Sheet

Read the article
Verification of UML Class Diagram

- by Jean Carlos Suárez Marranzini

This is my UML Class Diagram made in Astah Community, for a tennis scoreboard game. Here's a link to the image (I don't have enough rep to post images): http://i47.tinypic.com/2lsxx90.png Points are calculated based on moves. Moves can be either points (for the player's advantage) or errors (for the opponent's advantage). The Time Machine allows you to travel to previous game states (expressed as scoreboards). The storage component should be able to store matches independently of the serialization format. The serializers and deserializers should be able to do their job regardless of where the storage lies. The GameEngine should be able to apply the rules of the game regardless of the particularities of the game (hence, dependency injection through the Settings class). The outcomes of games, sets and matches should be deducible based on the points and the rules to apply (the logic implementations are there to provide the rules). Could you please verify my design and tell me if there's anything wrong with it? Thanks in advance.

Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >