Search Results

Search found 30190 results on 1208 pages for 'table row'.

Page 258/1208 | < Previous Page | 254 255 256 257 258 259 260 261 262 263 264 265  | Next Page >

  • Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

    - by Reed
    In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine.  If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine.  The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data.  Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection.  One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism.  In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library.  This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple.  Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter.  Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel.  Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition.  Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns.  We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change.  Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine.  Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method.  In this case, our for loop iteration block becomes a delegate creating via a lambda expression.  This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea.  There is a balance to be struck when writing parallel code.  We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce.  In this case, we have an image of data – most likely hundreds of pixels in both dimensions.  By just parallelizing our first loop, each row of pixels can be run as a single task.  With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks.  This introduces extra overhead with no extra gain, and will actually reduce our overall performance.  This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop.  I could have just as easily partitioned the inner loop.  However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels).  My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop.  If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance.  If we’re iterating through an IList<T> or an array, this is a typical approach.  However, there are other iteration statements common in C#.  In many situations, we’ll use foreach instead of a for loop.  This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation.  Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change.  Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist.  If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression.  This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel.  The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library.  These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism.  Iterating through collections and performing “work” is a very common pattern in nearly every application.  When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls.  Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

    Read the article

  • Adding DTrace Probes to PHP Extensions

    - by cj
    The powerful DTrace tracing facility has some PHP-specific probes that can be enabled with --enable-dtrace. DTrace for Linux is being created by Oracle and is currently in tech preview. Currently it doesn't support userspace tracing so, in the meantime, Systemtap can be used to monitor the probes implemented in PHP. This was recently outlined in David Soria Parra's post Probing PHP with Systemtap on Linux. My post shows how DTrace probes can be added to PHP extensions and traced on Linux. I was using Oracle Linux 6.3. Not all Linux kernels are built with Systemtap, since this can impact stability. Check whether your running kernel (or others installed) have Systemtap enabled, and reboot with such a kernel: # grep CONFIG_UTRACE /boot/config-`uname -r` # grep CONFIG_UTRACE /boot/config-* When you install Systemtap itself, the package systemtap-sdt-devel is needed since it provides the sdt.h header file: # yum install systemtap-sdt-devel You can now install and build PHP as shown in David's article. Basically the build is with: $ cd ~/php-src $ ./configure --disable-all --enable-dtrace $ make (For me, running 'make' a second time failed with an error. The workaround is to do 'git checkout Zend/zend_dtrace.d' and then rerun 'make'. See PHP Bug 63704) David's article shows how to trace the probes already implemented in PHP. You can also use Systemtap to trace things like userspace PHP function calls. For example, create test.php: <?php $c = oci_connect('hr', 'welcome', 'localhost/orcl'); $s = oci_parse($c, "select dbms_xmlgen.getxml('select * from dual') xml from dual"); $r = oci_execute($s); $row = oci_fetch_array($s, OCI_NUM); $x = $row[0]->load(); $row[0]->free(); echo $x; ?> The normal output of this file is the XML form of Oracle's DUAL table: $ ./sapi/cli/php ~/test.php <?xml version="1.0"?> <ROWSET> <ROW> <DUMMY>X</DUMMY> </ROW> </ROWSET> To trace the PHP function calls, create the tracing file functrace.stp: probe process("sapi/cli/php").function("zif_*") { printf("Started function %s\n", probefunc()); } probe process("sapi/cli/php").function("zif_*").return { printf("Ended function %s\n", probefunc()); } This makes use of the way PHP userspace functions (not builtins) like oci_connect() map to C functions with a "zif_" prefix. Login as root, and run System tap on the PHP script: # cd ~cjones/php-src # stap -c 'sapi/cli/php ~cjones/test.php' ~cjones/functrace.stp Started function zif_oci_connect Ended function zif_oci_connect Started function zif_oci_parse Ended function zif_oci_parse Started function zif_oci_execute Ended function zif_oci_execute Started function zif_oci_fetch_array Ended function zif_oci_fetch_array Started function zif_oci_lob_load <?xml version="1.0"?> <ROWSET> <ROW> <DUMMY>X</DUMMY> </ROW> </ROWSET> Ended function zif_oci_lob_load Started function zif_oci_free_descriptor Ended function zif_oci_free_descriptor Each call and return is logged. The Systemtap scripting language allows complex scripts to be built. There are many examples on the web. To augment this generic capability and the PHP probes in PHP, other extensions can have probes too. Below are the steps I used to add probes to OCI8: I created a provider file ext/oci8/oci8_dtrace.d, enabling three probes. The first one will accept a parameter that runtime tracing can later display: provider php { probe oci8__connect(char *username); probe oci8__nls_start(); probe oci8__nls_done(); }; I updated ext/oci8/config.m4 with the PHP_INIT_DTRACE macro. The patch is at the end of config.m4. The macro takes the provider prototype file, a name of the header file that 'dtrace' will generate, and a list of sources files with probes. When --enable-dtrace is used during PHP configuration, then the outer $PHP_DTRACE check is true and my new probes will be enabled. I've chosen to define an OCI8 specific macro, HAVE_OCI8_DTRACE, which can be used in the OCI8 source code: diff --git a/ext/oci8/config.m4 b/ext/oci8/config.m4 index 34ae76c..f3e583d 100644 --- a/ext/oci8/config.m4 +++ b/ext/oci8/config.m4 @@ -341,4 +341,17 @@ if test "$PHP_OCI8" != "no"; then PHP_SUBST_OLD(OCI8_ORACLE_VERSION) fi + + if test "$PHP_DTRACE" = "yes"; then + AC_CHECK_HEADERS([sys/sdt.h], [ + PHP_INIT_DTRACE([ext/oci8/oci8_dtrace.d], + [ext/oci8/oci8_dtrace_gen.h],[ext/oci8/oci8.c]) + AC_DEFINE(HAVE_OCI8_DTRACE,1, + [Whether to enable DTrace support for OCI8 ]) + ], [ + AC_MSG_ERROR( + [Cannot find sys/sdt.h which is required for DTrace support]) + ]) + fi + fi In ext/oci8/oci8.c, I added the probes at, for this example, semi-arbitrary places: diff --git a/ext/oci8/oci8.c b/ext/oci8/oci8.c index e2241cf..ffa0168 100644 --- a/ext/oci8/oci8.c +++ b/ext/oci8/oci8.c @@ -1811,6 +1811,12 @@ php_oci_connection *php_oci_do_connect_ex(char *username, int username_len, char } } +#ifdef HAVE_OCI8_DTRACE + if (DTRACE_OCI8_CONNECT_ENABLED()) { + DTRACE_OCI8_CONNECT(username); + } +#endif + /* Initialize global handles if they weren't initialized before */ if (OCI_G(env) == NULL) { php_oci_init_global_handles(TSRMLS_C); @@ -1870,11 +1876,22 @@ php_oci_connection *php_oci_do_connect_ex(char *username, int username_len, char size_t rsize = 0; sword result; +#ifdef HAVE_OCI8_DTRACE + if (DTRACE_OCI8_NLS_START_ENABLED()) { + DTRACE_OCI8_NLS_START(); + } +#endif PHP_OCI_CALL_RETURN(result, OCINlsEnvironmentVariableGet, (&charsetid_nls_lang, 0, OCI_NLS_CHARSET_ID, 0, &rsize)); if (result != OCI_SUCCESS) { charsetid_nls_lang = 0; } smart_str_append_unsigned_ex(&hashed_details, charsetid_nls_lang, 0); + +#ifdef HAVE_OCI8_DTRACE + if (DTRACE_OCI8_NLS_DONE_ENABLED()) { + DTRACE_OCI8_NLS_DONE(); + } +#endif } timestamp = time(NULL); The oci_connect(), oci_pconnect() and oci_new_connect() calls all use php_oci_do_connect_ex() internally. The first probe simply records that the PHP application made a connection call. I already showed a way to do this without needing a probe, but adding a specific probe lets me record the username. The other two probes can be used to time how long the globalization initialization takes. The relationships between the oci8_dtrace.d names like oci8__connect, the probe guards like DTRACE_OCI8_CONNECT_ENABLED() and probe names like DTRACE_OCI8_CONNECT() are obvious after seeing the pattern of all three probes. I included the new header that will be automatically created by the dtrace tool when PHP is built. I did this in ext/oci8/php_oci8_int.h: diff --git a/ext/oci8/php_oci8_int.h b/ext/oci8/php_oci8_int.h index b0d6516..c81fc5a 100644 --- a/ext/oci8/php_oci8_int.h +++ b/ext/oci8/php_oci8_int.h @@ -44,6 +44,10 @@ # endif # endif /* osf alpha */ +#ifdef HAVE_OCI8_DTRACE +#include "oci8_dtrace_gen.h" +#endif + #if defined(min) #undef min #endif Now PHP can be rebuilt: $ cd ~/php-src $ rm configure && ./buildconf --force $ ./configure --disable-all --enable-dtrace \ --with-oci8=instantclient,/home/cjones/instantclient $ make If 'make' fails, do the 'git checkout Zend/zend_dtrace.d' trick I mentioned. The new probes can be seen by logging in as root and running: # stap -l 'process.provider("php").mark("oci8*")' -c 'sapi/cli/php -i' process("sapi/cli/php").provider("php").mark("oci8__connect") process("sapi/cli/php").provider("php").mark("oci8__nls_done") process("sapi/cli/php").provider("php").mark("oci8__nls_start") To test them out, create a new trace file, oci.stp: global numconnects; global start; global numcharlookups = 0; global tottime = 0; probe process.provider("php").mark("oci8-connect") { printf("Connected as %s\n", user_string($arg1)); numconnects += 1; } probe process.provider("php").mark("oci8-nls_start") { start = gettimeofday_us(); numcharlookups++; } probe process.provider("php").mark("oci8-nls_done") { tottime += gettimeofday_us() - start; } probe end { printf("Connects: %d, Charset lookups: %ld\n", numconnects, numcharlookups); printf("Total NLS charset initalization time: %ld usecs/connect\n", (numcharlookups 0 ? tottime/numcharlookups : 0)); } This calculates the average time that the NLS character set lookup takes. It also prints out the username of each connection, as an example of using parameters. Login as root and run Systemtap over the PHP script: # cd ~cjones/php-src # stap -c 'sapi/cli/php ~cjones/test.php' ~cjones/oci.stp Connected as cj <?xml version="1.0"?> <ROWSET> <ROW> <DUMMY>X</DUMMY> </ROW> </ROWSET> Connects: 1, Charset lookups: 1 Total NLS charset initalization time: 164 usecs/connect This shows the time penalty of making OCI8 look up the default character set. This time would be zero if a character set had been passed as the fourth argument to oci_connect() in test.php.

    Read the article

  • SQL SERVER – Enable Identity Insert – Import Expert Wizard

    - by pinaldave
    I recently got email from old friend who told me that when he tries to execute SSIS package it fails with some identity error. After some debugging and opening his package we figure out that he has following issue. Let us see what kind of set up he had on his package. Source Table with Identity column Destination Table with Identity column Following checkbox was disabled in Import Expert Wizard (as per the image below) What did we do is we enabled the checkbox described as above and we fixed the problem he was having due to insertion in identity column. The reason he was facing this error because his destination table had IDENTITY property which will not allow any  insert from user. This value is automatically generated by system when new values are inserted in the table. However, when user manually tries to insert value in the table, it stops them and throws an error. As we enabled the checkbox “Enable Identity Insert”, this feature allowed the values to be insert in the identity field and this way from source database exact identity values were moved to destination table. Let me know if this blog post was easy to understand. Reference: Pinal Dave (http://blog.SQLAuthority.com), Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

    Read the article

  • Zipcodes in CSV Generation

    - by BRADINO
    When exporting to CSV format, then opening in a spreadsheet program like Excel zipcodes that start with a zero or zeroes have the preceding zeros stripped off. Obviously it is because the spreadsheet sees that column as integers and preceding zeros in integers are useless. A quick and dirty trick to force Excel (hopefully you are using OpenOffice) to display the full zipcode, we wrap it in double quotes and put an equal sign in front of it, to force it to be a string like this: $zipcode = 00123; $data = '="' . $zipcode . '"' ; So if you are doing the straight query to CSV export, using the fputcsv function it would look something like this. Basically just overwrite the value in the row and then continue along. while ($row = mysql_fetch_assoc($query)){         $row['zipcode'] = '="'.$row['zipcode'].'"';     fputcsv($output, $row); } php csv zipcode csv number csv force string

    Read the article

  • SQL SERVER – Importing CSV File Into Database – SQL in Sixty Seconds #018 – Video

    - by pinaldave
    Importing data into database is one of the most important tasks. I often receive questions regarding what is the quickest way to insert CSV data or how to import CSV Data into SQL Server Table. Honestly the process is very simple and the script is even simpler. In today’s SQL in Sixty Seconds Video we will learn how quickly we can insert CSV data into SQL Server. The steps to import CSV are very simple. Create Table Use Bulk Insert to import the data Verify the data Done! Absolutely it is that simple. More on Importing CSV Data: SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server SQL SERVER – Import CSV File into Database Table Using SSIS SQL SERVER – Create a Comma Delimited List Using SELECT Clause From Table Column SQL SERVER – Comma Separated Values (CSV) from Table Column SQL SERVER – Comma Separated Values (CSV) from Table Column – Part 2 I encourage you to submit your ideas for SQL in Sixty Seconds. We will try to accommodate as many as we can. If we like your idea we promise to share with you educational material. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Database, Pinal Dave, PostADay, SQL, SQL Authority, SQL in Sixty Seconds, SQL Query, SQL Scripts, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL, Technology, Video

    Read the article

  • Beware Sneaky Reads with Unique Indexes

    - by Paul White NZ
    A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for.  In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest ( pk INTEGER IDENTITY NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);   WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( some_value, lob_data ) SELECT TOP (1000) N.n, @Data FROM Numbers N WHERE N.n <= 1000; Test 1: A Simple Update Let’s run a query to subtract one from every value in the some_value column: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources.  The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time.  The query plan is also very simple: Looking at the Clustered Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan: The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed.  The some_value column is used by the Compute Scalar to calculate the new value.  (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT). Test 2: Simple Update with an Index Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column: CREATE NONCLUSTERED INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); This is not a useful index for our simple update query; imagine that someone else created it for a different purpose.  Let’s run our update query again: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms.  The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index. The query plan is very similar to before (click to enlarge): The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index. The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update.  SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post on non-updating updates for details).  Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case. The output list of columns from the Clustered Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns.  Cool. Overall then, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table.  Let’s see what happens if we make the nonclustered index unique. Test 3: Simple Update with a Unique Index Here’s the script to create a new unique index, and drop the old one: CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); GO DROP INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest; Remember that SQL Server only enforces uniqueness on index keys (the some_data column).  The lob_data column is simply stored at the leaf-level of the non-clustered index.  With that in mind, we might expect this change to make very little difference.  Let’s see: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; Whoa!  Now look at the elapsed time and logical reads: Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.   CPU time = 172 ms, elapsed time = 16172 ms. Even with all the data and index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead). Why on earth is SQL Server reading LOB data in a query that only updates a single integer column? The Query Plan The query plan for test 3 looks a bit more complex than before: In fact, the bottom level is exactly the same as we saw with the non-unique index.  The top level has heaps of new stuff though, which I’ll come to in a moment. You might be expecting to find that the Clustered Index Scan is now reading the lob_data column (for some reason).  After all, we need to explain where all the LOB logical reads are coming from.  Sadly, when we look at the properties of the Clustered Index Scan, we see exactly the same as before: SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads? Updates that Sneakily Read Data We have to go as far as the Clustered Index Update operator before we see LOB data in the output list: [Expr1020] is a bit flag added by an earlier Compute Scalar.  It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier). The Clustered Index Update operator adds two new columns: the lob_data column, and some_value_OLD.  The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column.  At this point, the clustered index has already been updated with the new value, but we haven’t touched the nonclustered index yet. An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its update operation.  SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered Index Update.  The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then.  Sneaky! Why the LOB Data Is Needed This is all very interesting (I hope), but why is SQL Server reading the LOB data?  For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update? The answer relates to the top row of the query plan for test 3.  I’ll reproduce it here for convenience: Notice that this is a wide (per-index) update plan.  SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index too.  I’ll talk more about this difference shortly. The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient.  It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered index. Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3.  If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4.  The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4. By applying net changes, SQL Server can also avoid false unique-key violations.  If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail.  You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down.  That’s fine, but it’s not possible to generalize this logic to work with every possible update query. SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations.  It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits.  As a result, you will often receive a wide update plan, even when you can see that no violations are possible. Another benefit of this optimization is that it includes a sort on the index key as part of its work.  Processing the index changes in index key order promotes sequential I/O against the nonclustered index. A side-effect of all this is that the net changes might include one or more inserts.  In order to insert a new row in the index, SQL Server obviously needs all the columns – the key column and the included LOB column.  This is the reason SQL Server reads the LOB data as part of the Clustered Index Update. In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs).  In order to generate the correct index key delete operation, it needs the old key value. The irony is that in this case the Split/Sort/Collapse optimization is anything but.  Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it. Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates. Beating the Set-Based Update with a Cursor One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated.  Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data: SET NOCOUNT ON; SET STATISTICS XML, IO, TIME OFF;   DECLARE @PK INTEGER, @StartTime DATETIME; SET @StartTime = GETUTCDATE();   DECLARE curUpdate CURSOR LOCAL FORWARD_ONLY KEYSET SCROLL_LOCKS FOR SELECT L.pk FROM LOBtest L ORDER BY L.pk ASC;   OPEN curUpdate;   WHILE (1 = 1) BEGIN FETCH NEXT FROM curUpdate INTO @PK;   IF @@FETCH_STATUS = -1 BREAK; IF @@FETCH_STATUS = -2 CONTINUE;   UPDATE dbo.LOBtest SET some_value = some_value - 1 WHERE CURRENT OF curUpdate; END;   CLOSE curUpdate; DEALLOCATE curUpdate;   SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE()); That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!) I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it.  One could just as well use a WHERE clause that specified the primary key value instead. Clustered Indexes A clustered index is the ultimate index with included columns: all non-key columns are included columns in a clustered index.  Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes: IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL DROP TABLE dbo.LOBtest; GO CREATE TABLE dbo.LOBtest ( pk INTEGER NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);   WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( pk, some_value, lob_data ) SELECT TOP (1000) N.n, N.n, @Data FROM Numbers N WHERE N.n <= 1000; Now here’s a query to modify the cluster keys: UPDATE dbo.LOBtest SET pk = pk + 1; The query plan is: As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection.  In addition, SQL Server now has no choice but to read the LOB data in the Clustered Index Scan: The performance is not great, as you might expect (even though there is no non-clustered index to maintain): Table 'LOBtest'. Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.   Table 'Worktable'. Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0, lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.   SQL Server Execution Times: CPU time = 483 ms, elapsed time = 17884 ms. Notice how the LOB data is read twice: once from the Clustered Index Scan, and again from the work table in tempdb used by the Eager Spool. If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns): A unique non-clustered index (on a heap) works well too: Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads.  (In fact the heap is rather more efficient). There are lots more fun combinations to try that I don’t have space for here. Final Thoughts The behaviour shown in this post is not limited to LOB data by any means.  If the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

    Read the article

  • SSIS Design Pattern: Loading Variable-Length Rows

    - by andyleonard
    Introduction I encounter flat file sources with variable-length rows on occassion. Here, I supply one SSIS Design Pattern for loading them. What's a Variable-Length Row Flat File? Great question - let's start with a definition. A variable-length row flat file is a text source of some flavor - comma-separated values (CSV), tab-delimited file (TDF), or even fixed-length, positional-, or ordinal-based (where the location of the data on the row defines its field). The major difference between a "normal"...(read more)

    Read the article

  • Know more about Enqueue Deadlock Detection

    - by Liu Maclean(???)
    ??? ORACLE ALLSTAR???????????????????,??????? ???????enqueue lock?????????3 ??????,????????????????????????????ora-00060 dead lock??process???3s: SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi PL/SQL Release 10.2.0.5.0 - Production CORE 10.2.0.5.0 Production TNS for Linux: Version 10.2.0.5.0 - Production NLSRTL Version 10.2.0.5.0 - Production SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.oracledatabase12g.com PROCESS A: set timing on; update maclean1 set t1=t1+1; PROCESS B: update maclean2 set t1=t1+1; PROCESS A: update maclean2 set t1=t1+1; PROCESS B: update maclean1 set t1=t1+1; ??3s? PROCESS A ?? ERROR at line 1: ORA-00060: deadlock detected while waiting for resource Elapsed: 00:00:03.02 ????Process A????????????? 3s,?????????????,??????? ?????????? ???????: SQL> col name for a30 SQL> col value for a5 SQL> col DESCRIB for a50 SQL> set linesize 140 pagesize 1400 SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm='_enqueue_deadlock_scan_secs'; NAME VALUE DESCRIB ------------------------------ ----- -------------------------------------------------- _enqueue_deadlock_scan_secs 0 deadlock scan interval SQL> alter system set "_enqueue_deadlock_scan_secs"=18 scope=spfile; System altered. Elapsed: 00:00:00.01 SQL> startup force; ORACLE instance started. Total System Global Area 851443712 bytes Fixed Size 2100040 bytes Variable Size 738198712 bytes Database Buffers 104857600 bytes Redo Buffers 6287360 bytes Database mounted. Database opened. PROCESS A: SQL> set timing on; SQL> update maclean1 set t1=t1+1; 1 row updated. Elapsed: 00:00:00.06 Process B SQL> update maclean2 set t1=t1+1; 1 row updated. SQL> update maclean1 set t1=t1+1; Process A: SQL> SQL> alter session set events '10704 trace name context forever,level 10:10046 trace name context forever,level 8'; Session altered. SQL> update maclean2 set t1=t1+1; update maclean2 set t1=t1+1 * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource  Elapsed: 00:00:18.05 ksqcmi: TX,90011,4a9 mode=6 timeout=21474836 WAIT #12: nam='enq: TX - row lock contention' ela= 2930070 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114759849120 WAIT #12: nam='enq: TX - row lock contention' ela= 2930636 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114762779801 WAIT #12: nam='enq: TX - row lock contention' ela= 2930439 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114765710430 *** 2012-06-12 09:58:43.089 WAIT #12: nam='enq: TX - row lock contention' ela= 2931698 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114768642192 WAIT #12: nam='enq: TX - row lock contention' ela= 2930428 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114771572755 WAIT #12: nam='enq: TX - row lock contention' ela= 2931408 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114774504207 DEADLOCK DETECTED ( ORA-00060 ) [Transaction Deadlock] The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock: ??????Process A?’enq: TX – row lock contention’ ?????ORA-00060 deadlock detected????3s ??? 18s , ???hidden parameter “_enqueue_deadlock_scan_secs”?????,????????0? ??????????: SQL> alter system set "_enqueue_deadlock_scan_secs"=4 scope=spfile; System altered. Elapsed: 00:00:00.01 SQL> alter system set "_enqueue_deadlock_time_sec"=9 scope=spfile; System altered. Elapsed: 00:00:00.00 SQL> startup force; ORACLE instance started. Total System Global Area 851443712 bytes Fixed Size 2100040 bytes Variable Size 738198712 bytes Database Buffers 104857600 bytes Redo Buffers 6287360 bytes Database mounted. Database opened. SQL> set linesize 140 pagesize 1400 SQL> show parameter dead NAME TYPE VALUE ------------------------------------ -------------------------------- ------------------------------ _enqueue_deadlock_scan_secs integer 4 _enqueue_deadlock_time_sec integer 9 SQL> set timing on SQL> select * from maclean1 for update wait 8; T1 ---------- 11 Elapsed: 00:00:00.01 PROCESS B SQL> select * from maclean2 for update wait 8; T1 ---------- 3 SQL> select * from maclean1 for update wait 8; select * from maclean1 for update wait 8 PROCESS A SQL> select * from maclean2 for update wait 8; select * from maclean2 for update wait 8 * ERROR at line 1: ORA-30006: resource busy; acquire with WAIT timeout expired Elapsed: 00:00:08.00 ???????? ??? select for update wait?enqueue request timeout ?????8s? ,???????”_enqueue_deadlock_scan_secs”=4(deadlock scan interval),?4s???deadlock detected,????Process A????deadlock ???, ??????? ??Process A?????8s?raised??”ORA-30006: resource busy; acquire with WAIT timeout expired”??,??ORA-00060,?????process A???????? ????????”_enqueue_deadlock_time_sec”(requests with timeout <= this will not have deadlock detection)???,?enqueue request time < “_enqueue_deadlock_time_sec”?Server process?????dead lock detection,?????????enqueue request ??????timeout??????(_enqueue_deadlock_time_sec????5,?timeout<5s),???????????????;??????timeout>”_enqueue_deadlock_time_sec”???,Oracle????????????????????? ??????????: SQL> show parameter dead NAME TYPE VALUE ------------------------------------ -------------------------------- ------------------------------ _enqueue_deadlock_scan_secs integer 4 _enqueue_deadlock_time_sec integer 9 Process A: SQL> set timing on; SQL> select * from maclean1 for update wait 10; T1 ---------- 11 Process B: SQL> select * from maclean2 for update wait 10; T1 ---------- 3 SQL> select * from maclean1 for update wait 10; PROCESS A: SQL> select * from maclean2 for update wait 10; select * from maclean2 for update wait 10 * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource Elapsed: 00:00:06.02 ??????? select for update wait 10?10s??, ?? 10s?????_enqueue_deadlock_time_sec???(9s),??Process A???????? ???????????????6s ???????_enqueue_deadlock_scan_secs?4s ? ???????????,???????????_enqueue_deadlock_scan_secs?????????3???? ??: enqueue lock?????????????? 1. ?????????deadlock detection??3s????, ????????_enqueue_deadlock_scan_secs(deadlock scan interval)???,??????0,????????_enqueue_deadlock_scan_secs?????????3???, ?_enqueue_deadlock_scan_secs=0 ??3s??, ?_enqueue_deadlock_scan_secs=4??6s??,????? 2. ???????_enqueue_deadlock_time_sec(requests with timeout <= this will not have deadlock detection)???,?enqueue request timeout< _enqueue_deadlock_time_sec(????5),?Server process?????????enqueue request timeout>_enqueue_deadlock_time_sec ????_enqueue_deadlock_scan_secs???????, ??request timeout??????select for update wait [TIMEOUT]??? ??: ???10.2.0.1?????????2?hidden parameter , ???patchset 10.2.0.3????? _enqueue_deadlock_time_sec, ?patchset 10.2.0.5??????_enqueue_deadlock_scan_secs? ?????RAC???????????10s, ???????_lm_dd_interval(dd time interval in seconds) ,????????8.0.6???? ???????????????,??????,  ?10g???????60s,?11g???????10s?  ???????11g??_lm_dd_interval?????????????,?????11g??LMD????????????,??????????RAC?LMD?Deadlock Detection???????CPU,???11g?Oracle????Team???LMD????????CPU????: ????????11g?LMD???????,???????11g??? UTS TRACE ????? DD???: SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> SQL> select * from global_name 2 ; GLOBAL_NAME -------------------------------------------------------------------------------- www.oracledatabase12g.com SQL> alter system set "_lm_dd_interval"=20 scope=spfile; System altered. SQL> startup force; ORACLE instance started. Total System Global Area 1570009088 bytes Fixed Size 2228704 bytes Variable Size 1325403680 bytes Database Buffers 234881024 bytes Redo Buffers 7495680 bytes Database mounted. Database opened. SQL> set linesize 140 pagesize 1400 SQL> show parameter lm_dd NAME TYPE VALUE ------------------------------------ -------------------------------- ------------------------------ _lm_dd_interval integer 20 SQL> select count(*) from gv$instance; COUNT(*) ---------- 2 instance 1: SQL> oradebug setorapid 12 Oracle pid: 12, Unix process pid: 8608, image: [email protected] (LMD0) ? LMD0??? UTS TRACE??RAC???????????? SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high; Statement processed. Elapsed: 00:00:00.00 SQL> update maclean1 set t1=t1+1; 1 row updated. instance 2: SQL> update maclean2 set t1=t1+1; 1 row updated. SQL> update maclean1 set t1=t1+1; Instance 1: SQL> update maclean2 set t1=t1+1; update maclean2 set t1=t1+1 * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource Elapsed: 00:00:20.51 LMD0???UTS TRACE 2012-06-12 22:27:00.929284 : [kjmpbmsg:process][type 22][msg 0x7fa620ac85a8][from 1][seq 8148.0][len 192] 2012-06-12 22:27:00.929346 : [kjmxmpm][type 22][seq 0.0][msg 0x7fa620ac85a8][from 1] *** 2012-06-12 22:27:00.929 * kjddind: received DDIND msg with subtype x6 * reqp->dd_master_inst_kjxmddi == 1 * kjddind: dump sgh: 2012-06-12 22:27:00.929346*: kjddind: req->timestamp [0.15], kjddt [0.13] 2012-06-12 22:27:00.929346*: >> DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1 2012-06-12 22:27:00.929346*: lock [0x95023930,829], op = [mast] 2012-06-12 22:27:00.929346*: reqp->timestamp [0.15], kjddt [0.13] 2012-06-12 22:27:00.929346*: kjddind: updated local timestamp [0.15] * kjddind: case KJX_DD_REMOTE 2012-06-12 22:27:00.929346*: ADD IO NODE WFG: 0 frame pointer 2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: POP: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: kjddopr[TX 0xe000c.0x32][ext 0x5,0x0]: blocking lock 0xbbb9a800, owner 2097154 of inst 2 2012-06-12 22:27:00.929346*: PUSH: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: PROCESS: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: POP: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: kjddopt: converting lock 0xbbce92f8 on 'TX' 0x80016.0x5d4,txid [2097154,34]of inst 2 2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1 2012-06-12 22:27:00.929855 : GSIPC:AMBUF: rcv buff 0x7fa620aa8cd8, pool rcvbuf, rqlen 1102 2012-06-12 22:27:00.929878 : GSIPC:GPBMSG: new bmsg 0x7fa620aa8d48 mb 0x7fa620aa8cd8 msg 0x7fa620aa8d68 mlen 192 dest x100 flushsz -1 2012-06-12 22:27:00.929878*: << DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 2->1,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1 2012-06-12 22:27:00.929878*: lock [0xbbce92f8,287], op = [mast] 2012-06-12 22:27:00.929878*: ADD IO NODE WFG: 0 frame pointer 2012-06-12 22:27:00.929923 : [kjmpbmsg:compl][msg 0x7fa620ac8588][typ p][nmsgs 1][qtime 0][ptime 0] 2012-06-12 22:27:00.929947 : GSIPC:PBAT: flush start. flag 0x79 end 0 inc 4.4 2012-06-12 22:27:00.929963 : GSIPC:PBAT: send bmsg 0x7fa620aa8d48 blen 224 dest 1.0 2012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1 012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1 2012-06-12 22:27:00.929996 : GSIPC:BSEND: flushing sndq 0xb491dd28, id 0, dcx 0xbc517770, inst 1, rcvr 0 qlen 0 1 2012-06-12 22:27:00.930014 : GSIPC:BSEND: no batch1 msg 0x7fa620aa8d48 type 65521 len 224 dest (1:0) 2012-06-12 22:27:00.930088 : kjbsentscn[0x0.3f72dc][to 1] 2012-06-12 22:27:00.930144 : GSIPC:SENDM: send msg 0x7fa620aa8d48 dest x10000 seq 8325 type 65521 tkts x1 mlen xe00110 2012-06-12 22:27:00.930531 : GSIPC:KSXPCB: msg 0x7fa620aa8d48 status 30, type 65521, dest 1, rcvr 0 WAIT #0: nam='ges remote message' ela= 1372 waittime=80 loop=0 p3=74 obj#=-1 tim=1339554420931640 2012-06-12 22:27:00.931728 : GSIPC:RCVD: ksxp msg 0x7fa620af6490 sndr 1 seq 0.8149 type 65521 tkts 1 2012-06-12 22:27:00.931746 : GSIPC:RCVD: watq msg 0x7fa620af6490 sndr 1, seq 8149, type 65521, tkts 1 2012-06-12 22:27:00.931763 : GSIPC:RCVD: seq update (0.8148)->(0.8149) tp -15 fg 0x4 from 1 pbattr 0x0 2012-06-12 22:27:00.931779 : GSIPC:TKT: collect msg 0x7fa620af6490 from 1 for rcvr 0, tickets 1 2012-06-12 22:27:00.931794 : kjbrcvdscn[0x0.3f72dc][from 1][idx 2012-06-12 22:27:00.931810 : kjbrcvdscn[no bscn dd_master_inst_kjxmddi == 1 * kjddind: dump sgh: NXTIN (nil) 0 wq 0 cvtops x0 0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1 BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2 BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2 NXTOUT (nil) 0 wq 0 cvtops x0 0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1 2012-06-12 22:27:00.932058*: kjddind: req->timestamp [0.15], kjddt [0.15] 2012-06-12 22:27:00.932058*: >> DDmsg:KJX_DD_VALIDATE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1 2012-06-12 22:27:00.932058*: lock [(nil),0], op = [vald_dd] 2012-06-12 22:27:00.932058*: kjddind: updated local timestamp [0.15] * kjddind: case KJX_DD_VALIDATE *** 2012-06-12 22:27:00.932 * kjddvald called: kjxmddi stuff: * cont_lockp (nil) * dd_lockp 0x95023930 * dd_inst 1 * dd_master_inst 1 * sgh graph: NXTIN (nil) 0 wq 0 cvtops x0 0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1 BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2 BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2 NXTOUT (nil) 0 wq 0 cvtops x0 0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1 POP WFG NODE: lock=(nil) * kjddvald: dump the PRQ: BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2 BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2 * kjddvald: KJDD_NXTONOD ->node_kjddsg.dinst_kjddnd =1 * kjddvald: ... which is not my node, my subgraph is validated but the cycle is not complete Global blockers dump start:--------------------------------- DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x80016][0x5d4],[TX][ext 0x2,0x0] ??dead lock!!! ???????11.2.0.3???? RAC LMD???????????”_lm_dd_interval”????????????20s?  ???????10g?_lm_dd_interval???60s,??????Processes?????????????????,????????????Server Process????????60s??????11g?????(??????LMD???????)???????,???????????10s??? Enqueue Deadlock Detection? ?11g??? RAC?LMD???????hidden parameter ????”_lm_dd_interval”???,RAC????????????????,???????????: SQL> col name for a50 SQL> col describ for a60 SQL> col value for a20 SQL> set linesize 140 pagesize 1400 SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm like '_lm_dd%'; NAME VALUE DESCRIB -------------------------------------------------- -------------------- ------------------------------------------------------------ _lm_dd_interval 20 dd time interval in seconds _lm_dd_scan_interval 5 dd scan interval in seconds _lm_dd_search_cnt 3 number of dd search per token get _lm_dd_max_search_time 180 max dd search time per token _lm_dd_maxdump 50 max number of locks to be dumped during dd validation _lm_dd_ignore_nodd FALSE if TRUE nodeadlockwait/nodeadlockblock options are ignored 6 rows selected.

    Read the article

  • Creating an email notification system based on polling database rows

    - by Ashish Sharma
    I have to design an email notification system based on the following requirements: The email notifications would be created based on polling rows in a Mysql 5.5 DB table when they are in a particular 'Completed' state. The email notification should be sent out in no more than 5 minutes from the time the row was created in the DB table (At the time of DB table row creation the state of the row might not be 'Completed'). Once 5 minutes for the DB table row expire in reaching the 'Completed' state, separate email notification need to be sent (basically telling the user that the original email notification would be delayed) and then sending the email notification as and when the row state reaches to being 'Completed'. The rest of the system requirements are : Adding relevant checks to monitor the whole system via MBeans interface. The system should be scalable so that if the rate of DB table rows creation increases so does the Email notification system be able to ramp up. So I request suggestions on following lines: What approach should I take in solving the problem described from a programming/Design pattern point of view? Suggestion for any third party plugin/software that can be used to solve the problem described? Points to take care regarding scalability and monitoring the health of the system? Java is the language of preference but I am open to using off the shelf components that can be interfaced with Java language or provide standard ports for communication. Currently I do have an in house grown system (written in Java) that is catering to the specified requirements, but it's now crumbling under increased load and now I want to give the problem a fresh look. thanks in advance Ashish

    Read the article

  • Reading and conditionally updating N rows, where N > 100,000 for DNA Sequence processing

    - by makerofthings7
    I have a proof of concept application that uses Azure tables to associate DNA sequences to "something". Table 1 is the master table. It uniquely lists every DNA sequence. The PK is a load balanced hash of the RK. The RK is the unique encoded value of the DNA sequence. Additional tables are created per subject. Each subject has a list of N DNA sequences that have one reference in the Master table, where N is 100,000. It is possible for many tables to reference the same DNA sequence, but in this case only one entry will be present in the Master table. My Azure dilemma: I need to lock the reference in the Master table as I work with the data. I need to handle timeouts, and prevent other threads from overwriting my data as one C# thread is working with the information. Other threads need to realise that this is locked, and move onto other unlocked records and do the work. Ideally I'd like to get some progress report of how my computation is going, and have the option to cancel the process (and unwind the locks). Question What is the best approach for this? I'm looking at these code snippets for inspiration: http://blogs.msdn.com/b/jimoneil/archive/2010/10/05/azure-home-part-7-asynchronous-table-storage-pagination.aspx http://stackoverflow.com/q/4535740/328397

    Read the article

  • Project Euler 18: (Iron)Python

    - by Ben Griswold
    In my attempt to learn (Iron)Python out in the open, here’s my solution for Project Euler Problem 18.  As always, any feedback is welcome. # Euler 18 # http://projecteuler.net/index.php?section=problems&id=18 # By starting at the top of the triangle below and moving # to adjacent numbers on the row below, the maximum total # from top to bottom is 23. # # 3 # 7 4 # 2 4 6 # 8 5 9 3 # # That is, 3 + 7 + 4 + 9 = 23. # Find the maximum total from top to bottom of the triangle below: # 75 # 95 64 # 17 47 82 # 18 35 87 10 # 20 04 82 47 65 # 19 01 23 75 03 34 # 88 02 77 73 07 63 67 # 99 65 04 28 06 16 70 92 # 41 41 26 56 83 40 80 70 33 # 41 48 72 33 47 32 37 16 94 29 # 53 71 44 65 25 43 91 52 97 51 14 # 70 11 33 28 77 73 17 78 39 68 17 57 # 91 71 52 38 17 14 91 43 58 50 27 29 48 # 63 66 04 68 89 53 67 30 73 16 69 87 40 31 # 04 62 98 27 23 09 70 98 73 93 38 53 60 04 23 # NOTE: As there are only 16384 routes, it is possible to solve # this problem by trying every route. However, Problem 67, is the # same challenge with a triangle containing one-hundred rows; it # cannot be solved by brute force, and requires a clever method! ;o) import time start = time.time() triangle = [ [75], [95, 64], [17, 47, 82], [18, 35, 87, 10], [20, 04, 82, 47, 65], [19, 01, 23, 75, 03, 34], [88, 02, 77, 73, 07, 63, 67], [99, 65, 04, 28, 06, 16, 70, 92], [41, 41, 26, 56, 83, 40, 80, 70, 33], [41, 48, 72, 33, 47, 32, 37, 16, 94, 29], [53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14], [70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57], [91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48], [63, 66, 04, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31], [04, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 04, 23]] # Loop through each row of the triangle starting at the base. for a in range(len(triangle) - 1, -1, -1): for b in range(0, a): # Get the maximum value for adjacent cells in current row. # Update the cell which would be one step prior in the path # with the new total. For example, compare the first two # elements in row 15. Add the max of 04 and 62 to the first # position of row 14.This provides the max total from row 14 # to 15 starting at the first position. Continue to work up # the triangle until the maximum total emerges at the # triangle's apex. triangle [a-1][b] += max(triangle [a][b], triangle [a][b+1]) print triangle [0][0] print "Elapsed Time:", (time.time() - start) * 1000, "millisecs" a=raw_input('Press return to continue')

    Read the article

  • Smart defaults [SSDT]

    - by jamiet
    I’ve just discovered a new, somewhat hidden, feature in SSDT that I didn’t know about and figured it would be worth highlighting here because I’ll bet not many others know it either; the feature is called Smart Defaults. It gets around the problem of adding a NOT NULLable column to an existing table that has got data in it – previous to SSDT you would need to define a DEFAULT constraint however it does feel rather cumbersome to create an object purely for the purpose of pushing through a deployment – that’s the situation that Smart Defaults is meant to alleviate. The Smart Defaults option exists in the advanced section of a Publish Profile file: The description of the setting is “Automatically provides a default value when updating a table that contains data with a column that does not allow null values”, in other words checking that option will cause SSDT to insert an arbitrary default value into your newly created NON NULLable column. In case you’re wondering how it does it, here’s how: SSDT creates a DEFAULT CONSTRAINT at the same time as the column is created and then immediately removes that constraint: ALTER TABLE [dbo].[T1]    ADD [C1] INT NOT NULL,         CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b] DEFAULT 0 FOR [C1];ALTER TABLE [dbo].[T1] DROP CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b]; You can then update the value as appropriate in a Post-Deployment script. Pretty cool! On the downside, you can only specify this option for the whole project, not for an individual table or even an individual column – I’m not sure that I’d want to turn this on for an entire project as it could hide problems that a failed deployment would highlight, in other words smart defaults could be seen to be “papering over the cracks”. If you think that should be improved go and vote (and leave a comment) at [SSDT] Allow us to specify Smart defaults per table or even per column. @Jamiet

    Read the article

  • Smart defaults [SSDT]

    - by jamiet
    I’ve just discovered a new, somewhat hidden, feature in SSDT that I didn’t know about and figured it would be worth highlighting here because I’ll bet not many others know it either; the feature is called Smart Defaults. It gets around the problem of adding a NOT NULLable column to an existing table that has got data in it – previous to SSDT you would need to define a DEFAULT constraint however it does feel rather cumbersome to create an object purely for the purpose of pushing through a deployment – that’s the situation that Smart Defaults is meant to alleviate. The Smart Defaults option exists in the advanced section of a Publish Profile file: The description of the setting is “Automatically provides a default value when updating a table that contains data with a column that does not allow null values”, in other words checking that option will cause SSDT to insert an arbitrary default value into your newly created NON NULLable column. In case you’re wondering how it does it, here’s how: SSDT creates a DEFAULT CONSTRAINT at the same time as the column is created and then immediately removes that constraint: ALTER TABLE [dbo].[T1]    ADD [C1] INT NOT NULL,         CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b] DEFAULT 0 FOR [C1];ALTER TABLE [dbo].[T1] DROP CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b]; You can then update the value as appropriate in a Post-Deployment script. Pretty cool! On the downside, you can only specify this option for the whole project, not for an individual table or even an individual column – I’m not sure that I’d want to turn this on for an entire project as it could hide problems that a failed deployment would highlight, in other words smart defaults could be seen to be “papering over the cracks”. If you think that should be improved go and vote (and leave a comment) at [SSDT] Allow us to specify Smart defaults per table or even per column. @Jamiet

    Read the article

  • SQL SERVER – Concurrancy Problems and their Relationship with Isolation Level

    - by pinaldave
    Concurrency is simply put capability of the machine to support two or more transactions working with the same data at the same time. This usually comes up with data is being modified, as during the retrieval of the data this is not the issue. Most of the concurrency problems can be avoided by SQL Locks. There are four types of concurrency problems visible in the normal programming. 1)      Lost Update – This problem occurs when there are two transactions involved and both are unaware of each other. The transaction which occurs later overwrites the transactions created by the earlier update. 2)      Dirty Reads – This problem occurs when a transactions selects data that isn’t committed by another transaction leading to read the data which may not exists when transactions are over. Example: Transaction 1 changes the row. Transaction 2 changes the row. Transaction 1 rolls back the changes. Transaction 2 has selected the row which does not exist. 3)      Nonrepeatable Reads – This problem occurs when two SELECT statements of the same data results in different values because another transactions has updated the data between the two SELECT statements. Example: Transaction 1 selects a row, which is later on updated by Transaction 2. When Transaction A later on selects the row it gets different value. 4)      Phantom Reads – This problem occurs when UPDATE/DELETE is happening on one set of data and INSERT/UPDATE is happening on the same set of data leading inconsistent data in earlier transaction when both the transactions are over. Example: Transaction 1 is deleting 10 rows which are marked as deleting rows, during the same time Transaction 2 inserts row marked as deleted. When Transaction 1 is done deleting rows, there will be still rows marked to be deleted. When two or more transactions are updating the data, concurrency is the biggest issue. I commonly see people toying around with isolation level or locking hints (e.g. NOLOCK) etc, which can very well compromise your data integrity leading to much larger issue in future. Here is the quick mapping of the isolation level with concurrency problems: Isolation Dirty Reads Lost Update Nonrepeatable Reads Phantom Reads Read Uncommitted Yes Yes Yes Yes Read Committed No Yes Yes Yes Repeatable Read No No No Yes Snapshot No No No No Serializable No No No No I hope this 400 word small article gives some quick understanding on concurrency issues and their relation to isolation level. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Aggregating Excel cell contents that match a label [migrated]

    - by Josh
    I'm sure this isn't a terribly difficult thing, but it's not the type of question that easily lends itself to internet searches. I've been assigned a project for work involving a complex spreadsheet. I've done the usual =SUM and other basic Excel formulas, and I've got enough coding background that I'm able to at least fudge my way through VBA, but I'm not certain how to proceed with one part of the task. Simple version: On Sheet 1 I have a list of people (one on each row, person's name in column A), on sheet 2 I have a list of groups (one on each row, group name in column A). Each name in Sheet 1 has its own row, and I have a "Data Validation" dropdown menu where you choose the group each person belongs to. That dropdown is sourced from Sheet 2, where each group has a row. So essentially the data validation source for Sheet 1's "Group" column is just "=Sheet2!$a1:a100" or whatever. The problem is this: I want each group row in Sheet 2 to have a formula which results in a list of all the users which have been assigned to that group on Sheet 1. What I mean is something the equivalent of "select * from PeopleTab where GROUP = ThisGroup". The resulting cell would just stick the names together like "Bob Smith, Joe Jones, Sally Sanderson" I've been Googling for hours but I can't think of a way to phrase my search query to get the results I want. Here's an example of desired result (Dash-delimited. Can't find a way to make it look nice, table tags don't seem to work here): (Sheet 1) Bob Smith - Group 1 (selected from dropdown) Joe Jones - Group 2 (selected from dropdown) Sally Sanderson - Group 1 (selected from dropdown) (Sheet 2) Group 1 - Bob Smith, Sally Sanderson (result of formula) Group 2 - Joe Jones (result of formula) What formula (or even what function) do I use on that second column of sheet 2 to make a flat list out of the members of that group?

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part II

    - by dbayard
    Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet.  We demonstrated the calculations against sample data for a small set of accounts.  While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger.  So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data.  This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function.  We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet.  So our database was empty.  But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the).  The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create().  The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA.  So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function.  This worked.  No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts.  In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series.  Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply().  You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via  “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running.  What actually happens is that ore.groupApply uses the Oracle database to perform the work.  And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT.  Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function.  Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time.  Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care.  Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program.  Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance.  This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation.  But, this is not an issue for ORE.  Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations.  That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”.  To use rqGroupEval via SQL, there is a bit of simple setup needed.  Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table.  The next step is to define our R function to the database.  We do that via a call to ORE’s rqScriptCreate. Now we can test it.  The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax.  The first argument to irr_dataGroupEval is a cursor defining our input.  You can add additional where clauses and subqueries to this cursor as appropriate.  The second argument is any additional inputs to the R function.  The third argument is the text of a dummy select statement.  The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return.  The fourth argument is the column of the input table to split/group by.  The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example.  For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so.  In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that.  And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions.  For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine.  As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations.  To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features.  Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function.  That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above.  You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time.  While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues).  It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world.  In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

    Read the article

  • Bash script for mysql backup - error handling

    - by Jure1873
    I'm trying to backup a bunch of MyISAM tables in a way that would allow me to rsync/rdiff the backup directory to a remote location. I've came up with a script that dumps only the recently changed tables and sets the date of the file so that rsync can pick up only the changed ones, but now I don't know how to do the error handling - I would like the script to exit with a non 0 value if there are errors. How could I do that? #/bin/bash BKPDIR="/var/backups/db-mysql" mkdir -p $BKPDIR ERRORS=0 FIELDS="TABLE_SCHEMA, TABLE_NAME, UPDATE_TIME" W_COND="UPDATE_TIME >= DATE_ADD(CURDATE(), INTERVAL -2 DAY) AND TABLE_SCHEMA<>'information_schema'" mysql --skip-column-names -e "SELECT $FIELDS FROM information_schema.tables WHERE $W_COND;" | while read db table tstamp; do echo "DB: $db: TABLE: $table: ($tstamp)" mysqldump $db $table | gzip > $BKPDIR/$db-$table.sql.gz touch -d "$tstamp" $BKPDIR/$db-$table.sql.gz done exit $ERRORS

    Read the article

  • Updating physics for animated models

    - by Mathias Hölzl
    For a new game we have do set up a scene with a minimum of 30 bone animated models.(shooter) The problem is that the update process for the animated models takes too long. Thats what I do: Each character has ~30 bones and for every update tick the animation gets calculated and every bone fires a event with the new matrix. The physics receives the event with the new matrix and updates the collision shape for that bone. The time that it takes to build the animation isn't that bad (0.2ms for 30 Bones - 6ms for 30 models). But the main problem is that the physic engine (Bullet) uses a diffrent matrix for transformation and so its necessary to convert it. Code for matrix conversion: (~0.005ms) btTransform CLEAR_PHYSICS_API Mat_to_btTransform( Mat mat ) { btMatrix3x3 bulletRotation; btVector3 bulletPosition; XMFLOAT4X4 matData = mat.GetStorage(); // copy rotation matrix for ( int row=0; row<3; ++row ) for ( int column=0; column<3; ++column ) bulletRotation[row][column] = matData.m[column][row]; for ( int column=0; column<3; ++column ) bulletPosition[column] = matData.m[3][column]; return btTransform( bulletRotation, bulletPosition ); } The function for updating the transform(Physic): void CLEAR_PHYSICS_API BulletPhysics::VKinematicMove(Mat mat, ActorId aid) { if ( btRigidBody * const body = FindActorBody( aid ) ) { btTransform tmp = Mat_to_btTransform( mat ); body->setWorldTransform( tmp ); } } The real problem is the function FindActorBody(id): ActorIDToBulletActorMap::const_iterator found = m_actorBodies.find( id ); if ( found != m_actorBodies.end() ) return found->second; All physic actors are stored in m_actorBodies and thats why the updating process takes to long. But I have no idea how I could avoid this. Friendly greedings, Mathias

    Read the article

  • Matrix Multiplication with C++ AMP

    - by Daniel Moth
    As part of our API tour of C++ AMP, we looked recently at parallel_for_each. I ended that post by saying we would revisit parallel_for_each after introducing array and array_view. Now is the time, so this is part 2 of parallel_for_each, and also a post that brings together everything we've seen until now. The code for serial and accelerated Consider a naïve (or brute force) serial implementation of matrix multiplication  0: void MatrixMultiplySerial(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 1: { 2: for (int row = 0; row < M; row++) 3: { 4: for (int col = 0; col < N; col++) 5: { 6: float sum = 0.0f; 7: for(int i = 0; i < W; i++) 8: sum += vA[row * W + i] * vB[i * N + col]; 9: vC[row * N + col] = sum; 10: } 11: } 12: } We notice that each loop iteration is independent from each other and so can be parallelized. If in addition we have really large amounts of data, then this is a good candidate to offload to an accelerator. First, I'll just show you an example of what that code may look like with C++ AMP, and then we'll analyze it. It is assumed that you included at the top of your file #include <amp.h> 13: void MatrixMultiplySimple(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 14: { 15: concurrency::array_view<const float,2> a(M, W, vA); 16: concurrency::array_view<const float,2> b(W, N, vB); 17: concurrency::array_view<concurrency::writeonly<float>,2> c(M, N, vC); 18: concurrency::parallel_for_each(c.grid, 19: [=](concurrency::index<2> idx) restrict(direct3d) { 20: int row = idx[0]; int col = idx[1]; 21: float sum = 0.0f; 22: for(int i = 0; i < W; i++) 23: sum += a(row, i) * b(i, col); 24: c[idx] = sum; 25: }); 26: } First a visual comparison, just for fun: The beginning and end is the same, i.e. lines 0,1,12 are identical to lines 13,14,26. The double nested loop (lines 2,3,4,5 and 10,11) has been transformed into a parallel_for_each call (18,19,20 and 25). The core algorithm (lines 6,7,8,9) is essentially the same (lines 21,22,23,24). We have extra lines in the C++ AMP version (15,16,17). Now let's dig in deeper. Using array_view and extent When we decided to convert this function to run on an accelerator, we knew we couldn't use the std::vector objects in the restrict(direct3d) function. So we had a choice of copying the data to the the concurrency::array<T,N> object, or wrapping the vector container (and hence its data) with a concurrency::array_view<T,N> object from amp.h – here we used the latter (lines 15,16,17). Now we can access the same data through the array_view objects (a and b) instead of the vector objects (vA and vB), and the added benefit is that we can capture the array_view objects in the lambda (lines 19-25) that we pass to the parallel_for_each call (line 18) and the data will get copied on demand for us to the accelerator. Note that line 15 (and ditto for 16 and 17) could have been written as two lines instead of one: extent<2> e(M, W); array_view<const float, 2> a(e, vA); In other words, we could have explicitly created the extent object instead of letting the array_view create it for us under the covers through the constructor overload we chose. The benefit of the extent object in this instance is that we can express that the data is indeed two dimensional, i.e a matrix. When we were using a vector object we could not do that, and instead we had to track via additional unrelated variables the dimensions of the matrix (i.e. with the integers M and W) – aren't you loving C++ AMP already? Note that the const before the float when creating a and b, will result in the underling data only being copied to the accelerator and not be copied back – a nice optimization. A similar thing is happening on line 17 when creating array_view c, where we have indicated that we do not need to copy the data to the accelerator, only copy it back. The kernel dispatch On line 18 we make the call to the C++ AMP entry point (parallel_for_each) to invoke our parallel loop or, as some may say, dispatch our kernel. The first argument we need to pass describes how many threads we want for this computation. For this algorithm we decided that we want exactly the same number of threads as the number of elements in the output matrix, i.e. in array_view c which will eventually update the vector vC. So each thread will compute exactly one result. Since the elements in c are organized in a 2-dimensional manner we can organize our threads in a two-dimensional manner too. We don't have to think too much about how to create the first argument (a grid) since the array_view object helpfully exposes that as a property. Note that instead of c.grid we could have written grid<2>(c.extent) or grid<2>(extent<2>(M, N)) – the result is the same in that we have specified M*N threads to execute our lambda. The second argument is a restrict(direct3d) lambda that accepts an index object. Since we elected to use a two-dimensional extent as the first argument of parallel_for_each, the index will also be two-dimensional and as covered in the previous posts it represents the thread ID, which in our case maps perfectly to the index of each element in the resulting array_view. The kernel itself The lambda body (lines 20-24), or as some may say, the kernel, is the code that will actually execute on the accelerator. It will be called by M*N threads and we can use those threads to index into the two input array_views (a,b) and write results into the output array_view ( c ). The four lines (21-24) are essentially identical to the four lines of the serial algorithm (6-9). The only difference is how we index into a,b,c versus how we index into vA,vB,vC. The code we wrote with C++ AMP is much nicer in its indexing, because the dimensionality is a first class concept, so you don't have to do funny arithmetic calculating the index of where the next row starts, which you have to do when working with vectors directly (since they store all the data in a flat manner). I skipped over describing line 20. Note that we didn't really need to read the two components of the index into temporary local variables. This mostly reflects my personal choice, in some algorithms to break down the index into local variables with names that make sense for the algorithm, i.e. in this case row and col. In other cases it may i,j,k or x,y,z, or M,N or whatever. Also note that we could have written line 24 as: c(idx[0], idx[1])=sum  or  c(row, col)=sum instead of the simpler c[idx]=sum Targeting a specific accelerator Imagine that we had more than one hardware accelerator on a system and we wanted to pick a specific one to execute this parallel loop on. So there would be some code like this anywhere before line 18: vector<accelerator> accs = MyFunctionThatChoosesSuitableAccelerators(); accelerator acc = accs[0]; …and then we would modify line 18 so we would be calling another overload of parallel_for_each that accepts an accelerator_view as the first argument, so it would become: concurrency::parallel_for_each(acc.default_view, c.grid, ...and the rest of your code remains the same… how simple is that? Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • SSIS Debugging Tip: Using Data Viewers

    - by Jim Giercyk
    When you have an SSIS package error, it is often very helpful to see the data records that are causing the problem.  After all, if your input has 50,000 records and 1 of them has corrupt data, it can be a chore.  Your execution results will tell you which column contains the bad data, but not which record…..enter the Data Viewer. In this scenario I have created a truncation error.  The input length of [lastname] is 50, but the output table has a length of 15.  When it runs, at least one of the records causes the package to fail.     Now what?  We can tell from our execution results that there is a problem with [lastname], but we have no idea WHICH record?     Let’s identify the row that is actually causing the problem.  First, we grab the oft’ forgotten Row Count shape from our toolbar and connect it to the error output from our input query.  Remember that in order to intercept errors with the error output, you must redirect them.     The Row Count shape requires 1 integer variable.  For our purposes, we will not reference the variable, but it is still required in order for the package to run.  Typically we would use the variable to hold the number of rows in the table and refer back to it later in our process.  We are simply using the Row Count as a “Dead End” for errors.  I called my variable RowCounter.  To create a variable, with no shapes selected, right-click on the background and choose Variable.     Once we have setup the Row Count shape, we can right-click on the red line (error output) from the query, and select Data Viewers.  In the popup, we click the add button and we will see this:     There are other fancier options we can play with, but for now we just want to view the output in a grid.  WE select Grid, then click OK on all of the popup windows to shut them down.  We should now see a grid with a pair of glasses on the error output line.     So, we are ready to catch the error output in a grid and see that is causing the problem!  This time when we run the package, it does not fail because we directed the error to the Row Count.  We also get a popup window showing the error record in a grid.  If there were multiple errors we would see them all.     Indeed, the [lastname] column is longer than 15 characters.  Notice the last column in the grid, [Error Code – Description].  We knew this was a truncation error before we added the grid, but if you have worked with SSIS for any length of time, you know that some errors are much more obscure.  The description column can be very useful under those circumstances! Data viewers can be used any time we want to see the data that is actually in the pipeline;  they stop the package temporarily until we shut them.  Also remember that the Row Count shape can be used as a “Dead End”.  It is useful during development when we want to see the output from a dataflow, but don’t want to update a table or file with the data.  Data viewers are an invaluable tool for both development and debugging.  Just remember to REMOVE THEM before putting your package into production

    Read the article

  • RPi and Java Embedded GPIO: Hooking Up Your Wires for Java

    - by hinkmond
    So, you bought your blue jumper wires, your LEDs, your resistors, your breadboard, and your fill of Fry's for the day. How do you hook this cool stuff up to write Java code to blink them LEDs? I'll step you through it. First look at that pinout diagram of the GPIO header that's on your RPi. Find the pins in the corner of your RPi board and make sure to orient it the right way. The upper left corner pin should have the characters "P1" next to it on the board. That pin next to "P1" is your Pin #1 (in the diagram). Then, you can start counting left, right, next row, left, right, next row, left, right, and so on: Pins # 1, 2, next row, 3, 4, next row, 5, 6, and so on. Take one blue jumper wire and connect to Pin # 3 (GPIO0). Connect the other end to a resistor and then the other end of the resistor into the breadboard. Each row of grouped-together holes on a breadboard are connected, so plug in the short-end of a common cathode LED (long-end of a common anode LED) into a hole that is in the same grouping as where the resistor is plugged in. Then, connect the other end of the LED back to Pin # 6 (GND) on the RPi GPIO header. Now you have your first LED connected ready for you to write some Java code to turn it on and off. (As, extra credit you can connect 7 other LEDs the same way to with one lead to Pins # 5, 7, 11, 13, 15, 19 & 21). Whew! That wasn't so bad, was it? Next blog post on this thread will have some Java source code for you to try... Hinkmond

    Read the article

  • How can I set up conditional formatting to highlight a range only if all its cells are empty?

    - by Jennifer
    I am new to conditional formatting and having a hard time. I have 6 columns with 100 rows. What I would like to have happen is to highlight the row in one color if there is no data in it at all. If there is data in one cell within the row, however, I would like for the highlighting to be removed from the row completely. Currently I have it set up to highlight the entire row if there is no data in it and if there is data in one cell, only that cell has no highlighting....I can't seem to make the entire row's highlighting disappear. I have used the formula to determine which cells to format: =I16:N16="" formatting color is yellow. I know I have to add a second conditional format but I have tried numerous different formulas and cant seem to get it to work.

    Read the article

  • Per-pixel collision detection - why does XNA transform matrix return NaN when adding scaling?

    - by JasperS
    I looked at the TransformCollision sample on MSDN and added the Matrix.CreateTranslation part to a property in my collision detection code but I wanted to add scaling. The code works fine when I leave scaling commented out but when I add it and then do a Matrix.Invert() on the created translation matrix the result is NaN ({NaN,NaN,NaN},{NaN,NaN,NaN},...) Can anyone tell me why this is happening please? Here's the code from the sample: // Build the block's transform Matrix blockTransform = Matrix.CreateTranslation(new Vector3(-blockOrigin, 0.0f)) * // Matrix.CreateScale(block.Scale) * would go here Matrix.CreateRotationZ(blocks[i].Rotation) * Matrix.CreateTranslation(new Vector3(blocks[i].Position, 0.0f)); public static bool IntersectPixels( Matrix transformA, int widthA, int heightA, Color[] dataA, Matrix transformB, int widthB, int heightB, Color[] dataB) { // Calculate a matrix which transforms from A's local space into // world space and then into B's local space Matrix transformAToB = transformA * Matrix.Invert(transformB); // When a point moves in A's local space, it moves in B's local space with a // fixed direction and distance proportional to the movement in A. // This algorithm steps through A one pixel at a time along A's X and Y axes // Calculate the analogous steps in B: Vector2 stepX = Vector2.TransformNormal(Vector2.UnitX, transformAToB); Vector2 stepY = Vector2.TransformNormal(Vector2.UnitY, transformAToB); // Calculate the top left corner of A in B's local space // This variable will be reused to keep track of the start of each row Vector2 yPosInB = Vector2.Transform(Vector2.Zero, transformAToB); // For each row of pixels in A for (int yA = 0; yA < heightA; yA++) { // Start at the beginning of the row Vector2 posInB = yPosInB; // For each pixel in this row for (int xA = 0; xA < widthA; xA++) { // Round to the nearest pixel int xB = (int)Math.Round(posInB.X); int yB = (int)Math.Round(posInB.Y); // If the pixel lies within the bounds of B if (0 <= xB && xB < widthB && 0 <= yB && yB < heightB) { // Get the colors of the overlapping pixels Color colorA = dataA[xA + yA * widthA]; Color colorB = dataB[xB + yB * widthB]; // If both pixels are not completely transparent, if (colorA.A != 0 && colorB.A != 0) { // then an intersection has been found return true; } } // Move to the next pixel in the row posInB += stepX; } // Move to the next row yPosInB += stepY; } // No intersection found return false; }

    Read the article

  • Help with dual booting Windows 8.1 Professional and Ubuntu 13.10

    - by user1292548
    I recently installed a clean version of Windows 8.1 Professional on my Lenovo Y500 (with Samsung 256GB 840 Pro SSD). I have Windows all set up and running normally. I am trying to dual boot Windows 8.1 and Ubuntu 13.10, but the installation procedure don't allow me to either "Install alongside..." or shows my SSD partitions correctly when I chose the "Something Else" option. I have created a 25GB partition of free space in the Windows disk manager, but on the installation screen on Ubuntu, it shows the whole drive as a free space. I have tried installing with a burned .ISO disk and a bootable USB, the results are the same for both. Windows Disk Management screen: http://imageshack.us/a/img855/9504/59zu.jpg The Ubuntu installation screen: http://imageshack.us/a/img62/2712/9g6i.jpg I've ran into this problem before when trying to dual boot Ubuntu and Windows 7 Professional a month ago. But I gave up and never resolved the issue. --EDIT-- I tried what Eero Aaltonen suggested, and this is my result: ubuntu@ubuntu:~$ sudo parted /dev/sda print Warning: /dev/sda contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted -- possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table? Yes/No? yes Model: ATA Samsung SSD 840 (scsi) Disk /dev/sda: 256GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags ubuntu@ubuntu:~$

    Read the article

  • SQL language drawbacks, The Third Manifesto

    - by David Portabella
    Sometime ago I read about SQL language drawbacks (the basic language specification, not vendor specific), and one of the drawbacks was that the language does not allow to create a set of tuples that don't come from a table. For instance, SELECT firstName, lastName from people; this creates a set of tuples coming from the table people. Now, if I don't have this table people, and I want to return a constant, I'd need something like this to return a set of two tuples (this would not require to have a table): SELECT VALUES('james', 'dean'), ('tom', 'cruisse'); Why I would need that? Because of the same reasons that we can define constants (not only basic types, but objects and arrays also) in any advanced programming language. Workarounds, Yes, I could create a temporal table, fill the data, and SELECT from that table. This is a hack, to overcome the drawbacks of the poor SQL language. I think that I read about this somewhere in "The Third Manifesto", but I don't find the paragraph/example talking about this concrete drawback anymore. Do you know a reference about it?

    Read the article

< Previous Page | 254 255 256 257 258 259 260 261 262 263 264 265  | Next Page >