Search Results

Search found 62870 results on 2515 pages for 'usage data'.

Page 56/2515 | < Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63 | Next Page >

Data Integration Solution?

- by Shlomo

At my company we have a number of data feeds and processing that run on any given day. The number of feeds and processing steps is starting to out-number the ability to manage it ad-hoc as it is managed currently. Is there a good solution that helps with logging and managing/scheduling dependencies? For example: A: When file x is FTP dropped into directory D1, kick off processing step B B: Load flat file into DB1 C: When file y is FTP dropped into directory D2, kick off processing Step D D: Load flat file into DB11 E: When B and D are done, churn through the data, and load new data into DB111. F: When Step E is done, launch application process P G: etc... I want those steps to run at the appropriate times, not to mention if B fails, there's no reason to run steps E & F, but I could still run C & D. When I re-run B successfully, it should trigger just E & F to re-run, not C & D. We're a .NET/C#/Sql Server shop, and I'm already familiar with SSIS. Is that really the best there is? That manages steps well, but not external dependencies, or logging. Open source (.NET) preferred, but not required.

Read the article
Save many-to-one relationship from JSON into Core Data

- by Snow Crash

I'm wanting to save a Many-to-one relationship parsed from JSON into Core Data. The code that parses the JSON and does the insert into Core Data looks like this: for (NSDictionary *thisRecipe in recipes) { Recipe *recipe = [NSEntityDescription insertNewObjectForEntityForName:@"Recipe" inManagedObjectContext:insertionContext]; recipe.title = [thisRecipe objectForKey:@"Title"]; NSDictionary *ingredientsForRecipe = [thisRecipe objectForKey:@"Ingredients"]; NSArray *ingredientsArray = [ingredientsForRecipe objectForKey:@"Results"]; for (NSDictionary *thisIngredient in ingredientsArray) { Ingredient *ingredient = [NSEntityDescription insertNewObjectForEntityForName:@"Ingredient" inManagedObjectContext:insertionContext]; ingredient.name = [thisIngredient objectForKey:@"Name"]; } } NSSet *ingredientsSet = [NSSet ingredientsArray]; [recipe setIngredients:ingredientsSet]; Notes: "setIngredients" is a Core Data generated accessor method. There is a many-to-one relationship between Ingredients and Recipe However, when I run this I get the following error: NSCFDictionary managedObjectContext]: unrecognized selector sent to instance If I remove the last line (i.e. [recipe setIngredients:ingredientsSet];) then, taking a peek at the SQLite database, I see the Recipe and Ingredients have been stored but no relationship has been created between Recipe and Ingredients Any suggestions as to how to ensure the relationship is stored correctly?

Read the article
ERROR: Attempted to read or write protected memory. This is often an indication that other memory is corrupt

- by SPSamL

I get this error after having edited a few pages in SharePoint 2010. I have to do an IISReset on both front ends to get this to resolve. I don't know how to fix it or even what else to supply here, but please let me know as the resets now happen several times per day. Log Name: Application Source: ASP.NET 2.0.50727.0 Date: 1/26/2011 11:12:48 AM Event ID: 1309 Task Category: Web Event Level: Warning Keywords: Classic User: N/A Computer: PINTSPSFE02.samcstl.org Description: Event code: 3005 Event message: An unhandled exception has occurred. Event time: 1/26/2011 11:12:48 AM Event time (UTC): 1/26/2011 5:12:48 PM Event ID: c52fb336b7f147a3913fff3617a99d57 Event sequence: 4965 Event occurrence: 2178 Event detail code: 0 Application information: Application domain: /LM/W3SVC/1449762715/ROOT-2-129405348166941887 Trust level: WSS_Minimal Application Virtual Path: / Application Path: C:\inetpub\wwwroot\wss\VirtualDirectories\80\ Machine name: PINTSPSFE02 Process information: Process ID: 5928 Process name: w3wp.exe Account name: SAMC\MossAppPool Exception information: Exception type: AccessViolationException Exception message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Request information: Request URL: http://mosscluster/Pages/Home.aspx Request path: /Pages/Home.aspx User host address: 10.3.60.26 User: SAMC\BARNMD Is authenticated: True Authentication Type: NTLM Thread account name: SAMC\MossAppPool Thread information: Thread ID: 110 Thread account name: SAMC\MossAppPool Is impersonating: False Stack trace: at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) Custom event details: Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="ASP.NET 2.0.50727.0" /> <EventID Qualifiers="32768">1309</EventID> <Level>3</Level> <Task>3</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2011-01-26T17:12:48.000000000Z" /> <EventRecordID>35834</EventRecordID> <Channel>Application</Channel> <Computer>PINTSPSFE02.samcstl.org</Computer> <Security /> </System> <EventData> <Data>3005</Data> <Data>An unhandled exception has occurred.</Data> <Data>1/26/2011 11:12:48 AM</Data> <Data>1/26/2011 5:12:48 PM</Data> <Data>c52fb336b7f147a3913fff3617a99d57</Data> <Data>4965</Data> <Data>2178</Data> <Data>0</Data> <Data>/LM/W3SVC/1449762715/ROOT-2-129405348166941887</Data> <Data>WSS_Minimal</Data> <Data>/</Data> <Data>C:\inetpub\wwwroot\wss\VirtualDirectories\80\</Data> <Data>PINTSPSFE02</Data> <Data> </Data> <Data>5928</Data> <Data>w3wp.exe</Data> <Data>SAMC\MossAppPool</Data> <Data>AccessViolationException</Data> <Data></Data> <Data>http://mosscluster/Pages/Home.aspx</Data> <Data>/Pages/Home.aspx</Data> <Data>10.3.60.26</Data> <Data>SAMC\BARNMD</Data> <Data>True</Data> <Data>NTLM</Data> <Data>SAMC\MossAppPool</Data> <Data>110</Data> <Data>SAMC\MossAppPool</Data> <Data>False</Data> <Data> at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) </Data> </EventData> </Event>

Read the article
Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

- by Reed

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine. If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine. The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data. Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism. In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library. This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple. Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter. Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel. Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition. Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns. We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change. Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine. Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method. In this case, our for loop iteration block becomes a delegate creating via a lambda expression. This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea. There is a balance to be struck when writing parallel code. We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce. In this case, we have an image of data – most likely hundreds of pixels in both dimensions. By just parallelizing our first loop, each row of pixels can be run as a single task. With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks. This introduces extra overhead with no extra gain, and will actually reduce our overall performance. This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop. I could have just as easily partitioned the inner loop. However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels). My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop. If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance. If we’re iterating through an IList<T> or an array, this is a typical approach. However, there are other iteration statements common in C#. In many situations, we’ll use foreach instead of a for loop. This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation. Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change. Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist. If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression. This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel. The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library. These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism. Iterating through collections and performing “work” is a very common pattern in nearly every application. When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls. Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

Read the article
DMV {dm_os_ring_buffers} - Queries to help pinpoint current Issues / usual usage patterns

- by NeilHambly

I'm been running some queries (below) to help me identify when I have had time-sensitive performance issues around Memory/CPU, I didn't want to load up additional overhead to the system (unless absolutely neccessary) using traces or profiler - naturally we have various methods to do this Perfmon counters, DBCC, DMVs etc.. One quick way I like is to run a few DMV queries (normally back in seconds) to help me find those RECENT specific time periods when the system has been substantially changed in some way using, this is using the DMV dm_os_ring_buffers This one helps me identify when I'm expericing Timeout Errors (1222).. modiy code to look for other error as highlight belowDECLARE @ts_now BIGINT,@dt_max BIGINT, @dt_min BIGINT SELECT @ts_now = cpu_ticks / CONVERT(FLOAT, cpu_ticks_in_ms) FROM sys.dm_os_sys_info SELECT @dt_max = MAX(timestamp), @dt_min = MIN(timestamp) FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_EXCEPTION' SELECT record_id ,DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS EventTime ,y.Error ,UserDefined ,b.description as NormalizedText FROM ( SELECT record.value('(./Record/@id)[1]', 'int') AS record_id, record.value('(./Record/Exception/Error)[1]', 'int') AS Error, record.value('(./Record/Exception/UserDefined)[1]', 'int') AS UserDefined, TIMESTAMP FROM ( SELECT TIMESTAMP, CONVERT(XML, record) AS record FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_EXCEPTION' AND record LIKE '% %' ) AS x ) AS y INNER JOIN sys.sysmessages b on y.Error = b.error WHERE b.msglangid = 1033 and y.Error = 1222 ORDER BY record_id DESC Sample Output record_id EventTime Error UserDefined NormalizedText 15199195 18/03/2010 14:00 1222 0 Lock request time out period exceeded. 15199194 18/03/2010 14:00 1222 0 Lock request time out period exceeded. 15199193 18/03/2010 14:00 1222 0 Lock request time out period exceeded. 15199192 18/03/2010 14:00 1222 0 Lock request time out period exceeded. 15199191 18/03/2010 14:00 1222 0 Lock request time out period exceeded. This one helps me identify when I have Unusally High Processing (> 50%) or # Page-FaultsSELECT record.value('(./Record/@id)[1]', 'int') AS record_id,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS SystemIdle,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLProcessUtilization,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/UserModeTime)[1]', 'bigint') AS UserModeTime,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime)[1]', 'bigint') AS KernelModeTime,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/PageFaults)[1]', 'bigint') AS PageFaults,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta)[1]', 'bigint') AS WorkingSetDelta,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization)[1]', 'int') AS MemoryUtilization,TIMESTAMPFROM ( SELECT TIMESTAMP, CONVERT(XML, record) AS record FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' AND record LIKE '% %' ) AS x Example: Showing entries > 50% SQL CPU record_id SystemIdle SQLProcessUtilization UserModeTime KernelModeTime PageFaults WorkingSetDelta MemoryUtilization TIMESTAMP 111916 66 29 36718750 1374843750 21333 -40960 100 7991061289 111917 54 41 50156250 1954062500 26914 -28672 100 7991121290 111918 57 39 42968750 1838437500 30096 20480 100 7991181290 111919 41 53 43906250 2530156250 22088 -4096 100 7991241307 111920 48 45 40937500 2124062500 26395 8192 100 7991301310 111921 52 43 35625000 2052812500 21996 155648 100 7991361311 111922 40 55 36875000 2637343750 33355 -262144 100 7991421311 111923 36 58 44843750 2786562500 47019 28672 100 7991481311 111924 31 64 53437500 3046562500 31027 61440 100 7991541314 111925 36 57 43906250 2711250000 37074 -8192 100 7991601317 111926 52 43 43437500 2060156250 29176 20480 100 7991661318 111927 71 24 33750000 1141250000 14478 16384 100 7991721320 111928 71 23 34531250 1116250000 12711 -20480 100 7991781320 111929 53 36 46562500 1714062500 26684 200704 100 7991841323 Finally one to provide some understanding of the level of memory state changes that are ocuringSELECT record.value('(./Record/@id)[1]', 'int') AS 'record_id',record.value('(./Record/ResourceMonitor/Notification)[1]', 'VARCHAR(100)') AS 'ReservedMemory',record.value('(./Record/ResourceMonitor/Indicators)[1]', 'int') AS 'Indicators',record.value('(./Record/ResourceMonitor/Effect/@state)[1]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect/@reversed)[1]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect)[1]', 'VARCHAR(100)') AS 'APPLY-HIGHPM',record.value('(./Record/ResourceMonitor/Effect/@state)[2]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect/@reversed)[2]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect)[2]', 'VARCHAR(100)') AS 'APPLY-HIGHPM',record.value('(./Record/ResourceMonitor/Effect/@state)[3]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect/@reversed)[3]', 'VARCHAR(100)') + ' - ' + record.value('(./Record/ResourceMonitor/Effect)[3]', 'VARCHAR(100)') AS 'REVERT_HIGHPM',record.value('(./Record/MemoryNode/ReservedMemory)[1]', 'int') AS 'ReservedMemory',record.value('(./Record/MemoryNode/CommittedMemory)[1]', 'int') AS 'CommittedMemory',record.value('(./Record/MemoryNode/SharedMemory)[1]', 'int') AS 'SharedMemory',record.value('(./Record/MemoryNode/AWEMemory)[1]', 'int') AS 'AWEMemory',record.value('(./Record/MemoryNode/SinglePagesMemory)[1]', 'int') AS 'SinglePagesMemory',record.value('(./Record/MemoryNode/CachedMemory)[1]', 'int') AS 'CachedMemory',record.value('(./Record/MemoryRecord/MemoryUtilization)[1]', 'int') AS 'MemoryUtilization',record.value('(./Record/MemoryRecord/TotalPhysicalMemory)[1]', 'int') AS 'TotalPhysicalMemory',record.value('(./Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'int') AS 'AvailablePhysicalMemory',record.value('(./Record/MemoryRecord/TotalPageFile)[1]', 'int') AS 'TotalPageFile',record.value('(./Record/MemoryRecord/AvailablePageFile)[1]', 'int') AS 'AvailablePageFile',record.value('(./Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS 'TotalVirtualAddressSpace',record.value('(./Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS 'AvailableVirtualAddressSpace',record.value('(./Record/MemoryRecord/AvailableExtendedVirtualAddressSpace)[1]', 'bigint') AS 'AvailableExtendedVirtualAddressSpace', TIMESTAMPFROM ( SELECT TIMESTAMP, CONVERT(XML, record) AS record FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_RESOURCE_MONITOR' AND record LIKE '% %' ) AS x

Read the article
Data Profiling without SSIS

Strangely enough for a predominantly SSIS blog, this post is all about how to perform data profiling without using SSIS. Whilst the Data Profiling Task is a worthy addition, there are a couple of limitations I’ve encountered of late. The first is that it requires SQL Server 2008, and not everyone is there yet. The second is that it can only target SQL Server 2005 and above. What about older systems, which are the ones that we probably need to investigate the most, or other vendor databases such as Oracle? With these limitations in mind I did some searching to find a quick and easy alternative to help me perform some data profiling for a project I was working on recently. I only had SQL Server 2005 available, and anyway most of my target source systems were Oracle, and of course I had short timescales. I looked at several options. Some never got beyond the download stage, they failed to install or just did not run, and others provided less than I could have produced myself by spending 2 minutes writing some basic SQL queries. In the end I settled on an open source product called DataCleaner. To quote from their website: DataCleaner is an Open Source application for profiling, validating and comparing data. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more. DataCleaner is developed in Java and licensed under LGPL. As quoted above it claims to support profiling, validating and comparing data, but I didn’t really get past the profiling functions, so won’t comment on the other two. The profiling whilst not prefect certainly saved some time compared to the limited alternatives. The ability to profile heterogeneous data sources is a big advantage over the SSIS option, and I found it overall quite easy to use and performance was good. I could see it struggling at times, but actually for what it does I was impressed. It had some data type niggles with Oracle, and some metrics seem a little strange, although thankfully they were easy to augment with some SQL queries to ensure a consistent picture. The report export options didn’t do it for me, but copy and paste with a bit of Excel magic was sufficient. One initial point for me personally is that I have had limited exposure to things of the Java persuasion and whilst I normally get by fine, sometimes the simplest things can throw me. For example installing a JDBC driver, why do I have to copy files to make it all work, has nobody ever heard of an MSI? In case there are other people out there like me who have become totally indoctrinated with the Microsoft software paradigm, I’ve written a quick start guide that details every step required. Steps 1- 5 are the key ones, the rest is really an excuse for some screenshots to show you the tool. Quick Start Guide Step 1 - Download Data Cleaner. The Microsoft Windows zipped exe option, and I chose the latest stable build, currently DataCleaner 1.5.3 (final). Extract the files to a suitable location. Step 2 - Download Java. If you try and run datacleaner.exe without Java it will warn you, and then open your default browser and take you to the Java download site. Follow the installation instructions from there, normally just click Download Java a couple of times and you’re done. Step 3 - Download Microsoft SQL Server JDBC Driver. You may have SQL Server installed, but you won’t have a JDBC driver. Version 3.0 is the latest as of April 2010. There is no real installer, we are in the Java world here, but run the exe you downloaded to extract the files. The default Unzip to folder is not much help, so try a fully qualified path such as C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\ to ensure you can find the files afterwards. Step 4 - If you wish to use Windows Authentication to connect to your SQL Server then first we need to copy a file so that Data Cleaner can find it. Browse to the JDBC extract location from Step 3 and drill down to the file sqljdbc_auth.dll. You will have to choose the correct directory for your processor architecture. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\auth\x86\sqljdbc_auth.dll. Now copy this file to the Data Cleaner extract folder you chose in Step 1. An alternative method is to edit datacleaner.cmd in the data cleaner extract folder as detailed in this data cleaner wiki topic, but I find copying the file simpler. Step 5 – Now lets run Data Cleaner, just run datacleaner.exe from the extract folder you chose in Step 1. Step 6 – Complete or skip the registration screen, and ignore the task window for now. In the main window click settings. Step 7 – In the Settings dialog, select the Database drivers tab, then click Register database driver and select the Local JAR file option. Step 8 – Browse to the JDBC driver extract location from Step 3 and drill down to select sqljdbc4.jar. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\sqljdbc4.jar Step 9 – Select the Database driver class as com.microsoft.sqlserver.jdbc.SQLServerDriver, and then click the Test and Save database driver button. Step 10 - You should be back at the Settings dialog with a the list of drivers that includes SQL Server. Just click Save Settings to persist all your hard work. Step 11 – Now we can start to profile some data. In the main Data Cleaner window click New Task, and then Profile from the task window. Step 12 – In the Profile window click Open Database Step 13 – Now choose the SQL Server connection string option. Selecting a connection string gives us a template like jdbc:sqlserver://<hostname>:1433;databaseName=<database>, but obviously it requires some details to be entered for example jdbc:sqlserver://localhost:1433;databaseName=SQLBits. This will connect to the database called SQLBits on my local machine. The port may also have to be changed if using such as when you have a multiple instances of SQL Server running. If using SQL Server Authentication enter a username and password as required and then click Connect to database. You can use Window Authentication, just add integratedSecurity=true to the end of your connection string. e.g jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true. If you didn’t complete Step 4 above you will need to do so now and restart Data Cleaner before it will work. Manually setting the connection string is fine, but creating a named connection makes more sense if you will be spending any length of time profiling a specific database. As highlighted in the left-hand screen-shot, at the bottom of the dialog it includes partial instructions on how to create named connections. In the folder shown C:\Users\<Username>\.datacleaner\1.5.3, open the datacleaner-config.xml file in your editor of choice add your own details. You’ll see a sample connection in the file already, just add yours following the same pattern. e.g.  <bean class="dk.eobjects.datacleaner.gui.model.NamedConnection"> <property name="name" value="SQLBits Local Connection" /> <property name="driverClass" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" /> <property name="connectionString" value="jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true" /> <property name="tableTypes"> <list> <value>TABLE</value> <value>VIEW</value> </list> </property> </bean> Step 14 – Once back at the Profile window, you should now see your schemas, tables and/or views listed down the left hand side. Browse this tree and double-click a table to select it for profiling. You can then click Add profile, and choose some profiling options, before finally clicking Run profiling. You can see below a sample output for three of the most common profiles, click the image for full size. I hope this has given you a taster for DataCleaner, and should help you get up and running pretty quickly.

Read the article
Oracle Data Integration Solutions and the Oracle EXADATA Database Machine

- by João Vilanova

Oracle's data integration solutions provide a complete, open and integrated solution for building, deploying, and managing real-time data-centric architectures in operational and analytical environments. Fully integrated with and optimized for the Oracle Exadata Database Machine, Oracle's data integration solutions take data integration to the next level and delivers extremeperformance and scalability for all the enterprise data movement and transformation needs. Easy-to-use, open and standards-based Oracle's data integration solutions dramatically improve productivity, provide unparalleled efficiency, and lower the cost of ownership.You can watch a video about this subject, after clicking on the link below.DIS for EXADATA Video

Read the article
BIWA Wednesday TechCast Series - Opposition to Data Warehouse Initiatives

- by jenny.gelhausen

BIWA Wednesday TechCast Series - 19th Event! Opposition to Data Warehouse Initiatives Please join us for this webcast on Wednesday, March 24, 12 noon Eastern or check your local area's time Webcast is open to clients, prospects and partners. No matter how good your technology and technical skills, organizational issues can derail a data warehousing or BI project. Therefore BIWA presents a vital topic that crosses product boundaries: organizational resistance to data warehouse initiatives - how to recognize it and what to do about it. Many a DW/BI professional has been surprised by organizational resistance to DW/BI initiatives. Yet real organizational imperatives may be behind this apparently irrational behavior. Based on in-depth interviews with IT professionals, industry consultants, and power users, our speaker Bruce Jenks will present his research findings about what drives organizational resistance to data warehouse initiatives. The talk will cover specific behaviors that can signal organizational resistance to a data warehouse program and what organizations have done to address such resistance. Presenter: Bruce Jenks of Dun and Bradstreet Bruce Jenks has over 20 years experience in data warehousing and business intelligence, much of it as a consultant to large organizations spanning the US. Bruce's data warehousing clients have included firms such as Sprint, Gallo Wines, Southern California Edison, The Gap, and Safeway. He started his data warehousing career at Metaphor Computers, a pioneering DW/BI firm from which a number of industry luminaries sprang including Ralph Kimball (author of The Data Warehouse Toolkit ). Bruce continued his data warehousing career at HP, Stanford University and other firms. Bruce is currently completing his doctorate in business administration at Golden Gate University, and today's material arises from his doctoral research. He is also a principal consultant for Dun and Bradstreet. Audio Dial-In: 866 682 4770 Audio Meeting ID: 1683901 Audio Meeting Passcode: 334451 Web Conference: Please register at https://www1.gotomeeting.com/register/807185273 After you register you will be provided with a link to the TechCast. Invitation to Speakers: All BIWA members and Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) may submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community. Submit your BIWA TechCast abstract today! BIWA is a worldwide forum with over 2000 members who are business intelligence, warehousing and analytics professionals. BIWA presents information, experiences and best practices in successfully deploying Oracle Database-centric BI, Data Warehousing, and Analytics products, features and Options--the Oracle Database "BIWA" platform. Attendance Information & Replays at the BIWA website: oraclebiwa.org var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-13185312-1"); pageTracker._trackPageview(); } catch(err) {}

Read the article
SQL SERVER – Challenge – Puzzle – Usage of FAST Hint

- by pinaldave

I was recently working with various SQL Server Hints. After working for a day on various hints, I realize that for one hint, I am not able to come up with good example. The hint is FAST. Let us look at the definition of the FAST hint from the Book On-Line. FAST number_rows Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set. Now the question is in what condition this hint can be useful. I have tried so many different combination, I have found this hint does not make much performance difference, infect I did not notice any change in time taken to load the resultset. I noticed that this hint does not change number of the page read to return result. Now when there is difference in performance is expected because if you read the what FAST hint does is that it only returns first few results FAST – which does not mean there will be difference in performance. I also understand that this hint gives the guidance/suggestions/hint to query optimizer that there are only 100 rows are in expected resultset. This tricking the optimizer to think there are only 100 rows and which (may) lead to render different execution plan than the one which it would have taken in normal case (without hint). Again, not necessarily, this will happen always. Now if you read above discussion, you will find that basic understanding of the hint is very clear to me but I still feel that I am missing something. Here are my questions: 1) In what condition this hint can be useful? What is the case, when someone want to see first few rows early because my experience suggests that when first few rows are rendered remaining rows are rendered as well. 2) Is there any way application can retrieve the fast fetched rows from SQL Server? 3) Do you use this hint in your application? Why? When? and How? Here are few examples I have attempted during the my experiment and found there is no difference in execution plan except its estimated number of rows are different leading optimizer think that the cost is less but in reality that is not the case. USE AdventureWorks GO SET STATISTICS IO ON SET STATISTICS TIME ON GO --------------------------------------------- -- Table Scan with Fast Hint SELECT * FROM Sales.SalesOrderDetail GO SELECT * FROM Sales.SalesOrderDetail OPTION (FAST 100) GO --------------------------------------------- -- Table Scan with Where on Index Key SELECT * FROM Sales.SalesOrderDetail WHERE OrderQty = 14 GO SELECT * FROM Sales.SalesOrderDetail WHERE OrderQty = 14 OPTION (FAST 100) GO --------------------------------------------- -- Table Scan with Where on Index Key SELECT * FROM Sales.SalesOrderDetail WHERE SalesOrderDetailID < 1000 GO SELECT * FROM Sales.SalesOrderDetail WHERE SalesOrderDetailID < 1000 OPTION (FAST 100) GO Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Improving WIF’s Claims-based Authorization - Part 3 (Usage)

- by Your DisplayName here!

In the previous posts I showed off some of the additions I made to WIF’s authorization infrastructure. I now want to show some samples how I actually use these extensions. The following code snippets are from Thinktecture.IdentityServer on Codeplex. The following shows the MVC attribute on the WS-Federation controller: [ClaimsAuthorize(Constants.Actions.Issue, Constants.Resources.WSFederation)] public class WSFederationController : Controller or… [ClaimsAuthorize(Constants.Actions.Administration, Constants.Resources.RelyingParty)] public class RelyingPartiesAdminController : Controller In other places I used the imperative approach (e.g. the WRAP endpoint): if (!ClaimsAuthorize.CheckAccess(principal, Constants.Actions.Issue, Constants.Resources.WRAP)) { Tracing.Error("User not authorized"); return new UnauthorizedResult("WRAP", true); } For the WCF WS-Trust endpoints I decided to use the per-request approach since the SOAP actions are well defined here. The corresponding authorization manager roughly looks like this: public class AuthorizationManager : ClaimsAuthorizationManager { public override bool CheckAccess(AuthorizationContext context) { var action = context.Action.First(); var id = context.Principal.Identities.First(); // if application authorization request if (action.ClaimType.Equals(ClaimsAuthorize.ActionType)) { return AuthorizeCore(action, context.Resource, context.Principal.Identity as IClaimsIdentity); } // if ws-trust issue request if (action.Value.Equals(WSTrust13Constants.Actions.Issue)) { return AuthorizeTokenIssuance(new Collection<Claim> { new Claim(ClaimsAuthorize.ResourceType, Constants.Resources.WSTrust) }, id); } return base.CheckAccess(context); } } You see that it is really easy now to distinguish between per-request and application authorization which makes the overall design much easier. HTH

Read the article
SQL Monitor’s data repository

- by Chris Lambrou

As one of the developers of SQL Monitor, I often get requests passed on by our support people from customers who are looking to dip into SQL Monitor’s own data repository, in order to pull out bits of information that they’re interested in. Since there’s clearly interest out there in playing around directly with the data repository, I thought I’d write some blog posts to start to describe how it all works. The hardest part for me is knowing where to begin, since the schema of the data repository is pretty big. Hmmm… I guess it’s tricky for anyone to write anything but the most trivial of queries against the data repository without understanding the hierarchy of monitored objects, so perhaps my first post should start there. I always imagine that whenever a customer fires up SSMS and starts to explore their SQL Monitor data repository database, they become immediately bewildered by the schema – that was certainly my experience when I did so for the first time. The following query shows the number of different object types in the data repository schema: SELECT type_desc, COUNT(*) AS [count] FROM sys.objects GROUP BY type_desc ORDER BY type_desc; type_desccount 1DEFAULT_CONSTRAINT63 2FOREIGN_KEY_CONSTRAINT181 3INTERNAL_TABLE3 4PRIMARY_KEY_CONSTRAINT190 5SERVICE_QUEUE3 6SQL_INLINE_TABLE_VALUED_FUNCTION381 7SQL_SCALAR_FUNCTION2 8SQL_STORED_PROCEDURE100 9SYSTEM_TABLE41 10UNIQUE_CONSTRAINT54 11USER_TABLE193 12VIEW124 With 193 tables, 124 views, 100 stored procedures and 381 table valued functions, that’s quite a hefty schema, and when you browse through it using SSMS, it can be a bit daunting at first. So, where to begin? Well, let’s narrow things down a bit and only look at the tables belonging to the data schema. That’s where all of the collected monitoring data is stored by SQL Monitor. The following query gives us the names of those tables: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' ORDER BY sch.name, obj.name; This query still returns 110 tables. I won’t show them all here, but let’s have a look at the first few of them: name 1data.Cluster_Keys 2data.Cluster_Machine_ClockSkew_UnstableSamples 3data.Cluster_Machine_Cluster_StableSamples 4data.Cluster_Machine_Keys 5data.Cluster_Machine_LogicalDisk_Capacity_StableSamples 6data.Cluster_Machine_LogicalDisk_Keys 7data.Cluster_Machine_LogicalDisk_Sightings 8data.Cluster_Machine_LogicalDisk_UnstableSamples 9data.Cluster_Machine_LogicalDisk_Volume_StableSamples 10data.Cluster_Machine_Memory_Capacity_StableSamples 11data.Cluster_Machine_Memory_UnstableSamples 12data.Cluster_Machine_Network_Capacity_StableSamples 13data.Cluster_Machine_Network_Keys 14data.Cluster_Machine_Network_Sightings 15data.Cluster_Machine_Network_UnstableSamples 16data.Cluster_Machine_OperatingSystem_StableSamples 17data.Cluster_Machine_Ping_UnstableSamples 18data.Cluster_Machine_Process_Instances 19data.Cluster_Machine_Process_Keys 20data.Cluster_Machine_Process_Owner_Instances 21data.Cluster_Machine_Process_Sightings 22data.Cluster_Machine_Process_UnstableSamples 23… There are two things I want to draw your attention to: The table names describe a hierarchy of the different types of object that are monitored by SQL Monitor (e.g. clusters, machines and disks). For each object type in the hierarchy, there are multiple tables, ending in the suffixes _Keys, _Sightings, _StableSamples and _UnstableSamples. Not every object type has a table for every suffix, but the _Keys suffix is especially important and a _Keys table does indeed exist for every object type. In fact, if we limit the query to return only those tables ending in _Keys, we reveal the full object hierarchy: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' AND obj.name LIKE '%_Keys' ORDER BY sch.name, obj.name; name 1data.Cluster_Keys 2data.Cluster_Machine_Keys 3data.Cluster_Machine_LogicalDisk_Keys 4data.Cluster_Machine_Network_Keys 5data.Cluster_Machine_Process_Keys 6data.Cluster_Machine_Services_Keys 7data.Cluster_ResourceGroup_Keys 8data.Cluster_ResourceGroup_Resource_Keys 9data.Cluster_SqlServer_Agent_Job_History_Keys 10data.Cluster_SqlServer_Agent_Job_Keys 11data.Cluster_SqlServer_Database_BackupType_Backup_Keys 12data.Cluster_SqlServer_Database_BackupType_Keys 13data.Cluster_SqlServer_Database_CustomMetric_Keys 14data.Cluster_SqlServer_Database_File_Keys 15data.Cluster_SqlServer_Database_Keys 16data.Cluster_SqlServer_Database_Table_Index_Keys 17data.Cluster_SqlServer_Database_Table_Keys 18data.Cluster_SqlServer_Error_Keys 19data.Cluster_SqlServer_Keys 20data.Cluster_SqlServer_Services_Keys 21data.Cluster_SqlServer_SqlProcess_Keys 22data.Cluster_SqlServer_TopQueries_Keys 23data.Cluster_SqlServer_Trace_Keys 24data.Group_Keys The full object type hierarchy looks like this: Cluster Machine LogicalDisk Network Process Services ResourceGroup Resource SqlServer Agent Job History Database BackupType Backup CustomMetric File Table Index Error Services SqlProcess TopQueries Trace Group Okay, but what about the individual objects themselves represented at each level in this hierarchy? Well that’s what the _Keys tables are for. This is probably best illustrated by way of a simple example – how can I query my own data repository to find the databases on my own PC for which monitoring data has been collected? Like this: SELECT clstr._Name AS cluster_name, srvr._Name AS instance_name, db._Name AS database_name FROM data.Cluster_SqlServer_Database_Keys db JOIN data.Cluster_SqlServer_Keys srvr ON db.ParentId = srvr.Id -- Note here how the parent of a Database is a Server JOIN data.Cluster_Keys clstr ON srvr.ParentId = clstr.Id -- Note here how the parent of a Server is a Cluster WHERE clstr._Name = 'dev-chrisl2' -- This is the hostname of my own PC ORDER BY clstr._Name, srvr._Name, db._Name; cluster_nameinstance_namedatabase_name 1dev-chrisl2SqlMonitorData 2dev-chrisl2master 3dev-chrisl2model 4dev-chrisl2msdb 5dev-chrisl2mssqlsystemresource 6dev-chrisl2tempdb 7dev-chrisl2sql2005SqlMonitorData 8dev-chrisl2sql2005TestDatabase 9dev-chrisl2sql2005master 10dev-chrisl2sql2005model 11dev-chrisl2sql2005msdb 12dev-chrisl2sql2005mssqlsystemresource 13dev-chrisl2sql2005tempdb 14dev-chrisl2sql2008SqlMonitorData 15dev-chrisl2sql2008master 16dev-chrisl2sql2008model 17dev-chrisl2sql2008msdb 18dev-chrisl2sql2008mssqlsystemresource 19dev-chrisl2sql2008tempdb These results show that I have three SQL Server instances on my machine (a default instance, one named sql2005 and one named sql2008), and each instance has the usual set of system databases, along with a database named SqlMonitorData. Basically, this is where I test SQL Monitor on different versions of SQL Server, when I’m developing. There are a few important things we can learn from this query: Each _Keys table has a column named Id. This is the primary key. Each _Keys table has a column named ParentId. A foreign key relationship is defined between each _Keys table and its parent _Keys table in the hierarchy. There are two exceptions to this, Cluster_Keys and Group_Keys, because clusters and groups live at the root level of the object hierarchy. Each _Keys table has a column named _Name. This is used to uniquely identify objects in the table within the scope of the same shared parent object. Actually, that last item isn’t always true. In some cases, the _Name column is actually called something else. For example, the data.Cluster_Machine_Services_Keys table has a column named _ServiceName instead of _Name (sorry for the inconsistency). In other cases, a name isn’t sufficient to uniquely identify an object. For example, right now my PC has multiple processes running, all sharing the same name, Chrome (one for each tab open in my web-browser). In such cases, multiple columns are used to uniquely identify an object within the scope of the same shared parent object. Well, that’s it for now. I’ve given you enough information for you to explore the _Keys tables to see how objects are stored in your own data repositories. In a future post, I’ll try to explain how monitoring data is stored for each object, using the _StableSamples and _UnstableSamples tables. If you have any questions about this post, or suggestions for future posts, just submit them in the comments section below.

Read the article
SQL – Migrate Database from SQL Server to NuoDB – A Quick Tutorial

- by Pinal Dave

Data is growing exponentially and every organization with growing data is thinking of next big innovation in the world of Big Data. Big data is a indeed a future for every organization at one point of the time. Just like every other next big thing, big data has its own challenges and issues. The biggest challenge associated with the big data is to find the ideal platform which supports the scalability and growth of the data. If you are a regular reader of this blog, you must be familiar with NuoDB. I have been working with NuoDB for a while and their recent release is the best thus far. NuoDB is an elastically scalable SQL database that can run on local host, datacenter and cloud-based resources. A key feature of the product is that it does not require sharding (read more here). Last week, I was able to install NuoDB in less than 90 seconds and have explored their Explorer and Admin sections. You can read about my experiences in these posts: SQL – Step by Step Guide to Download and Install NuoDB – Getting Started with NuoDB SQL – Quick Start with Admin Sections of NuoDB – Manage NuoDB Database SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database Many SQL Authority readers have been following me in my journey to evaluate NuoDB. One of the frequently asked questions I’ve received from you is if there is any way to migrate data from SQL Server to NuoDB. The fact is that there is indeed a way to do so and NuoDB provides a fantastic tool which can help users to do it. NuoDB Migrator is a command line utility that supports the migration of Microsoft SQL Server, MySQL, Oracle, and PostgreSQL schemas and data to NuoDB. The migration to NuoDB is a three-step process: NuoDB Migrator generates a schema for a target NuoDB database It loads data into the target NuoDB database It dumps data from the source database Let’s see how we can migrate our data from SQL Server to NuoDB using a simple three-step approach. But before we do that we will create a sample database in MSSQL and later we will migrate the same database to NuoDB: Setup Step 1: Build a sample data CREATE DATABASE [Test]; CREATE TABLE [Department]( [DepartmentID] [smallint] NOT NULL, [Name] VARCHAR(100) NOT NULL, [GroupName] VARCHAR(100) NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_Department_DepartmentID] PRIMARY KEY CLUSTERED ( [DepartmentID] ASC ) ) ON [PRIMARY]; INSERT INTO Department SELECT * FROM AdventureWorks2012.HumanResources.Department; Note that I am using the SQL Server AdventureWorks database to build this sample table but you can build this sample table any way you prefer. Setup Step 2: Install Java 64 bit Before you can begin the migration process to NuoDB, make sure you have 64-bit Java installed on your computer. This is due to the fact that the NuoDB Migrator tool is built in Java. You can download 64-bit Java for Windows, Mac OSX, or Linux from the following link: http://java.com/en/download/manual.jsp. One more thing to remember is that you make sure that the path in your environment settings is set to your JAVA_HOME directory or else the tool will not work. Here is how you can do it: Go to My Computer >> Right Click >> Select Properties >> Click on Advanced System Settings >> Click on Environment Variables >> Click on New and enter the following values. Variable Name: JAVA_HOME Variable Value: C:\Program Files\Java\jre7 Make sure you enter your Java installation directory in the Variable Value field. Setup Step 3: Install JDBC driver for SQL Server. There are two JDBC drivers available for SQL Server. Select the one you prefer to use by following one of the two links below: Microsoft JDBC Driver jTDS JDBC Driver In this example we will be using jTDS JDBC driver. Once you download the driver, move the driver to your NuoDB installation folder. In my case, I have moved the JAR file of the driver into the C:\Program Files\NuoDB\tools\migrator\jar folder as this is my NuoDB installation directory. Now we are all set to start the three-step migration process from SQL Server to NuoDB: Migration Step 1: NuoDB Schema Generation Here is the command I use to generate a schema of my SQL Server Database in NuoDB. First I go to the folder C:\Program Files\NuoDB\tools\migrator\bin and execute the nuodb-migrator.bat file. Note that my database name is ‘test’. Additionally my username and password is also ‘test’. You can see that my SQL Server database is running on my localhost on port 1433. Additionally, the schema of the table is ‘dbo’. nuodb-migrator schema –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.path=/tmp/schema.sql The above script will generate a schema of all my SQL Server tables and will put it in the folder C:\tmp\schema.sql . You can open the schema.sql file and execute this file directly in your NuoDB instance. You can follow the link here to see how you can execute the SQL script in NuoDB. Please note that if you have not yet created the schema in the NuoDB database, you should create it before executing this step. Step 2: Generate the Dump File of the Data Once you have recreated your schema in NuoDB from SQL Server, the next step is very easy. Here we create a CSV format dump file, which will contain all the data from all the tables from the SQL Server database. The command to do so is very similar to the above command. Be aware that this step may take a bit of time based on your database size. nuodb-migrator dump –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.type=csv –output.path=/tmp/dump.cat Once the above command is successfully executed you can find your CSV file in the C:\tmp\ folder. However, you do not have to do anything manually. The third and final step will take care of completing the migration process. Migration Step 3: Load the Data into NuoDB After building schema and taking a dump of the data, the very next step is essential and crucial. It will take the CSV file and load it into the NuoDB database. nuodb-migrator load –target.url=jdbc:com.nuodb://localhost:48004/mytest –target.schema=dbo –target.username=test –target.password=test –input.path=/tmp/dump.cat Please note that in the above script we are now targeting the NuoDB database, which we have already created with the name of “MyTest”. If the database does not exist, create it manually before executing the above script. I have kept the username and password as “test”, but please make sure that you create a more secure password for your database for security reasons. Voila! You’re Done That’s it. You are done. It took 3 setup and 3 migration steps to migrate your SQL Server database to NuoDB. You can now start exploring the database and build excellent, scale-out applications. In this blog post, I have done my best to come up with simple and easy process, which you can follow to migrate your app from SQL Server to NuoDB. Download NuoDB I strongly encourage you to download NuoDB and go through my 3-step migration tutorial from SQL Server to NuoDB. Additionally here are two very important blog post from NuoDB CTO Seth Proctor. He has written excellent blog posts on the concept of the Administrative Domains. NuoDB has this concept of an Administrative Domain, which is a collection of hosts that can run one or multiple databases. Each database has its own TEs and SMs, but all are managed within the Admin Console for that particular domain. http://www.nuodb.com/techblog/2013/03/11/getting-started-provisioning-a-domain/ http://www.nuodb.com/techblog/2013/03/14/getting-started-running-a-database/ Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

Read the article
Recovering from apt-get upgrade gone wrong due to a full disk

- by Peter

I was performing an apt-get upgrade on an Ubuntu 12.04.5 LTS box that hadn't been updated in a little while and the upgrade failed due to 'No space left on device'. After a little while I worked out space meant inodes and I have freed some up but unfortunately things have been left something askew. I have tried manually installing the old versions of packages mentioned using dpkg -i but that doesn't help. I have tried apt-get upgrade and apt-get -f install to no avail. Results are below. Any ideas how to fix things up? FIXED: Installing the earlier versions again manually via dpkg -i and then apt-get -f install has done the trick. Not sure why this didn't work the first time. The packages in question are listed below but they will presumably vary. libssl1.0.0_1.0.1-4ubuntu5.14_i386.deb linux-headers-3.2.0-64-generic-pae_3.2.0-64.97_i386.deb linux-image-generic-pae_3.2.0.64.76_i386.deb linux-headers-3.2.0-64_3.2.0-64.97_all.deb linux-headers-generic-pae_3.2.0.64.76_i386.deb root@unlinked:/tmp# apt-get upgrade Reading package lists... Done Building dependency tree Reading state information... Done You might want to run ‘apt-get -f install’ to correct these. The following packages have unmet dependencies. libssl-dev : Depends: libssl1.0.0 (= 1.0.1-4ubuntu5.14) but 1.0.1-4ubuntu5.17 is installed linux-generic-pae : Depends: linux-image-generic-pae (= 3.2.0.64.76) but 3.2.0.67.79 is installed Depends: linux-headers-generic-pae (= 3.2.0.64.76) but 3.2.0.67.79 is installed E: Unmet dependencies. Try using -f. root@unlinked:/tmp# apt-get -f install Reading package lists... Done Building dependency tree Reading state information... Done Correcting dependencies... Done The following packages were automatically installed and are no longer required: linux-headers-3.2.0-43-generic-pae linux-headers-3.2.0-38-generic-pae linux-headers-3.2.0-41-generic-pae linux-headers-3.2.0-36-generic-pae linux-headers-3.2.0-63-generic-pae linux-headers-3.2.0-58-generic-pae linux-headers-3.2.0-60-generic-pae linux-headers-3.2.0-55-generic-pae linux-headers-3.2.0-40 linux-headers-3.2.0-41 linux-headers-3.2.0-36 linux-headers-3.2.0-37 linux-headers-3.2.0-43 linux-headers-3.2.0-38 linux-headers-3.2.0-44 linux-headers-3.2.0-39 linux-headers-3.2.0-45 linux-headers-3.2.0-51 linux-headers-3.2.0-52 linux-headers-3.2.0-53 linux-headers-3.2.0-48 linux-headers-3.2.0-54 linux-headers-3.2.0-60 linux-headers-3.2.0-55 linux-headers-3.2.0-61 linux-headers-3.2.0-56 linux-headers-3.2.0-57 linux-headers-3.2.0-63 linux-headers-3.2.0-58 linux-headers-3.2.0-59 linux-headers-3.2.0-52-generic-pae linux-headers-3.2.0-44-generic-pae linux-headers-3.2.0-39-generic-pae linux-headers-3.2.0-37-generic-pae linux-headers-3.2.0-59-generic-pae linux-headers-3.2.0-61-generic-pae linux-headers-3.2.0-56-generic-pae linux-headers-3.2.0-53-generic-pae linux-headers-3.2.0-48-generic-pae linux-headers-3.2.0-45-generic-pae linux-headers-3.2.0-40-generic-pae linux-headers-3.2.0-57-generic-pae linux-headers-3.2.0-54-generic-pae linux-headers-3.2.0-51-generic-pae Use 'apt-get autoremove' to remove them. The following extra packages will be installed: libssl-dev linux-generic-pae The following packages will be upgraded: libssl-dev linux-generic-pae 2 to upgrade, 0 to newly install, 0 to remove and 0 not to upgrade. 2 not fully installed or removed. Need to get 0 B/1,427 kB of archives. After this operation, 1,024 B of additional disk space will be used. Do you want to continue [Y/n]? y dpkg: dependency problems prevent configuration of libssl-dev: libssl-dev depends on libssl1.0.0 (= 1.0.1-4ubuntu5.14); however: Version of libssl1.0.0 on system is 1.0.1-4ubuntu5.17. dpkg: error processing libssl-dev (--configure): dependency problems - leaving unconfigured No apport report written because the error message indicates it's a follow-up error from a previous failure. dpkg: dependency problems prevent configuration of linux-generic-pae: linux-generic-pae depends on linux-image-generic-pae (= 3.2.0.64.76); however: Version of linux-image-generic-pae on system is 3.2.0.67.79. linux-generic-pae depends on linux-headers-generic-pae (= 3.2.0.64.76); however: Version of linux-headers-generic-pae on system is 3.2.0.67.79. dpkg: error processing linux-generic-pae (--configure): dependency problems - leaving unconfigured No apport report written because the error message indicates it's a follow-up error from a previous failure. Errors were encountered while processing: libssl-dev linux-generic-pae E: Sub-process /usr/bin/dpkg returned an error code (1)

Read the article
Ameristar Wins with Oracle GoldenGate’s Heterogeneous Real-Time Data Integration

- by Irem Radzik

Today we announced a press release about another successful project with Oracle GoldenGate. This time at Ameristar. Ameristar is a casino gaming company and needed a single data integration solution to connect multiple heterogeneous systems to its Teradata data warehouse. The project involves integration of Ameristar’s promotional and gaming data from 14 data sources across its 7 casino hotel properties in real time into a central Teradata data warehouse. The source systems include the Aristocrat gaming and MGT promotional management platforms running on Microsoft SQL Server 2000 databases. As you can notice, there was no Oracle Database involved in this project, but Ameristar’s IT leadership knew that GoldenGate’s strong heterogeneous and real-time data integration capabilities is the right technology for their data warehousing project. With GoldenGate Ameristar was able to reduce data latency to the enterprise data warehouse, and use this real-time customer information for marketing teams in improving overall customer experience. Ameristar customers receive more targeted and timely campaign offers, and the company has more up-to-date visibility into financial metrics of the company. One other key benefit the company experienced with GoldenGate is in operational costs. The previous data capture solution Ameristar used was trigger based and required a lot of effort to manage. They needed dedicated IT staff to maintain it. With GoldenGate, the solution runs seamlessly without needing a fully-dedicated staff, giving the IT team at Ameristar more resources for their other IT projects. If you want to learn more about GoldenGate and the latest features for Oracle Database and non-Oracle databases, please watch our on demand webcast about Oracle GoldenGate 11g Release 2.

Read the article
Out of disk space - /boot at 100%

- by uvasal

My /boot is at 100%. When I run aptitude search ~ilinux-image I'm getting loads of unused images. When I try to delete one of them (after checking which one is currently in use by doing uname -r), e.g apt-get autoremove linux-image-3.2.0-44-generic I get: Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt-get -f install' to correct these: The following packages have unmet dependencies: linux-generic : Depends: linux-headers-generic (= 3.2.0.51.61) but 3.2.0.54.64 is to be installed linux-server : Depends: linux-headers-server (= 3.2.0.51.61) but 3.2.0.54.64 is to be installed E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution). And running apt-get -f install throws No space left on device. I've also tried doing apt-get purge but I am getting the same thing. Output of df -h and dpkg -l linux-*.: root@hb2088:/srv/www# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 9.4G 3.0G 6.0G 34% / udev 301M 4.0K 301M 1% /dev tmpfs 124M 228K 124M 1% /run none 5.0M 0 5.0M 0% /run/lock none 309M 0 309M 0% /run/shm /dev/sda1 92M 91M 0 100% /boot root@hb2088:/srv/www# dpkg -l linux-* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Description +++-====================================================-====================================================-======================================================================================================================== un linux-doc-3.2.0 <none> (no description available) ii linux-firmware 1.79.6 Firmware for Linux kernel drivers iU linux-generic 3.2.0.51.61 Complete Generic Linux kernel un linux-headers <none> (no description available) un linux-headers-3 <none> (no description available) un linux-headers-3.0 <none> (no description available) ii linux-headers-3.2.0-44 3.2.0-44.69 Header files related to Linux kernel version 3.2.0 ii linux-headers-3.2.0-44-generic 3.2.0-44.69 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP ii linux-headers-3.2.0-45 3.2.0-45.70 Header files related to Linux kernel version 3.2.0 ii linux-headers-3.2.0-45-generic 3.2.0-45.70 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP ii linux-headers-3.2.0-48 3.2.0-48.74 Header files related to Linux kernel version 3.2.0 ii linux-headers-3.2.0-48-generic 3.2.0-48.74 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP ii linux-headers-3.2.0-51 3.2.0-51.77 Header files related to Linux kernel version 3.2.0 ii linux-headers-3.2.0-51-generic 3.2.0-51.77 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP ii linux-headers-3.2.0-52 3.2.0-52.78 Header files related to Linux kernel version 3.2.0 ii linux-headers-3.2.0-52-generic 3.2.0-52.78 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP iU linux-headers-3.2.0-54 3.2.0-54.82 Header files related to Linux kernel version 3.2.0 iU linux-headers-3.2.0-54-generic 3.2.0-54.82 Linux kernel headers for version 3.2.0 on 64 bit x86 SMP iU linux-headers-generic 3.2.0.54.64 Generic Linux kernel headers iU linux-headers-server 3.2.0.54.64 Linux kernel headers on Server Equipment. un linux-image <none> (no description available) un linux-image-3.0 <none> (no description available) ii linux-image-3.2.0-44-generic 3.2.0-44.69 Linux kernel image for version 3.2.0 on 64 bit x86 SMP ii linux-image-3.2.0-45-generic 3.2.0-45.70 Linux kernel image for version 3.2.0 on 64 bit x86 SMP ii linux-image-3.2.0-48-generic 3.2.0-48.74 Linux kernel image for version 3.2.0 on 64 bit x86 SMP iF linux-image-3.2.0-51-generic 3.2.0-51.77 Linux kernel image for version 3.2.0 on 64 bit x86 SMP iF linux-image-3.2.0-52-generic 3.2.0-52.78 Linux kernel image for version 3.2.0 on 64 bit x86 SMP in linux-image-3.2.0-54-generic <none> (no description available) iU linux-image-generic 3.2.0.51.61 Generic Linux kernel image iU linux-image-server 3.2.0.51.61 Linux kernel image on Server Equipment. un linux-initramfs-tool <none> (no description available) un linux-kernel-headers <none> (no description available) un linux-kernel-log-daemon <none> (no description available) ii linux-libc-dev 3.2.0-52.78 Linux Kernel Headers for development un linux-restricted-common <none> (no description available) iU linux-server 3.2.0.51.61 Complete Linux kernel on Server Equipment. un linux-source-3.2.0 <none> (no description available) un linux-tools <none> (no description available) Output of du -sh /boot/*: root@hb2088:~# du -sh /boot/* 781K /boot/abi-3.2.0-44-generic 781K /boot/abi-3.2.0-45-generic 781K /boot/abi-3.2.0-48-generic 781K /boot/abi-3.2.0-51-generic 781K /boot/abi-3.2.0-52-generic 139K /boot/config-3.2.0-44-generic 139K /boot/config-3.2.0-45-generic 139K /boot/config-3.2.0-48-generic 139K /boot/config-3.2.0-51-generic 139K /boot/config-3.2.0-52-generic 1.6M /boot/grub 14M /boot/initrd.img-3.2.0-44-generic 14M /boot/initrd.img-3.2.0-45-generic 14M /boot/initrd.img-3.2.0-48-generic 12K /boot/lost+found 174K /boot/memtest86+.bin 176K /boot/memtest86+_multiboot.bin 2.8M /boot/System.map-3.2.0-44-generic 2.8M /boot/System.map-3.2.0-45-generic 2.8M /boot/System.map-3.2.0-48-generic 2.8M /boot/System.map-3.2.0-51-generic 2.8M /boot/System.map-3.2.0-52-generic 4.8M /boot/vmlinuz-3.2.0-44-generic 4.8M /boot/vmlinuz-3.2.0-45-generic 4.8M /boot/vmlinuz-3.2.0-48-generic 4.8M /boot/vmlinuz-3.2.0-51-generic 4.8M /boot/vmlinuz-3.2.0-52-generic

Read the article
Changing the default installation path to a newly installed hard disk

- by mgj

Hi, I am currently working on a dual-booted PC. I am using Windows XP and Ubuntu 10.04 Lucid Lynx released in April 2010. The allocated partition to Ubuntu that I am making use of has almost exhausted. Current memory allocations on the PC wrt Ubuntu OS looks like this: bodhgaya@pc146724-desktop:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 8.6G 8.0G 113M 99% / none 998M 268K 998M 1% /dev none 1002M 580K 1002M 1% /dev/shm none 1002M 100K 1002M 1% /var/run none 1002M 0 1002M 0% /var/lock none 1002M 0 1002M 0% /lib/init/rw /dev/sda1 25G 16G 9.8G 62% /media/C /dev/sdb1 37G 214M 35G 1% /media/ubuntulinuxstore bodhgaya@pc146724-desktop:~$ cd /tmp I am trying to mount a 40GB(/dev/sdb1 - given below) new hard disk along with my existing Ubuntu system to overcome with hard disk space related issues. I referred to the following tutorial to mount a new hard disk onto the system:- http://www.smorgasbord.net/how-to-in...untu-linux%20/ I was able to successfully mount this hard disk for Ubuntu 0S. I have this new hard disk setup in /media/ubuntulinuxstore directory. The current partition in my system looks like this: bodhgaya@pc146724-desktop:/media/ubuntulinuxstore$ sudo fdisk -l [sudo] password for bodhgaya: Disk /dev/sda: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x446eceb5 Device Boot Start End Blocks Id System /dev/sda1 * 2 3264 26210047+ 7 HPFS/NTFS /dev/sda2 3265 4385 9004432+ 83 Linux /dev/sda3 4386 4863 3839535 82 Linux swap / Solaris Disk /dev/sdb: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xfa8afa8a Device Boot Start End Blocks Id System /dev/sdb1 1 4862 39053983+ 7 HPFS/NTFS bodhgaya@pc146724-desktop:/media/ubuntulinuxstore$ Now, I have a concern wrt the "location" where the new softwares will be installed. Generally softwares are installed via the terminal and by default a fixed path is used to where the post installation set up files can be found (I am talking in context of the drive). This is like the typical case of Windows, where softwares by default are installed in the C: drive. These days people customize their installations to a drive which they find apt to serve their purpose (generally based on availability of hard disk space). I am trying to figure out how to customize the same for Ubuntu. As we all know the most softwares are installed via commands given from the Terminal. My road block is how do I redirect the default path set on the terminal where files get installed to this new hard disk. This if done will help me overcome space constraints I am currently facing wrt the partition on which my Ubuntu is initially installed. I would also by this, save time on not formatting my system and reinstalling Ubuntu and other softwares all over again. Please help me with this, your suggestions are much appreciated.

Read the article
Problem with SLATEC routine usage with gfortran

- by user39461

I am trying to compute the Bessel function of the second kind (Bessel_y) using the SLATEC's Amos library available on Netlib. Here is the SLATEC code I use. Below I have pasted my test program that calls SLATEC routine CBESY. PROGRAM BESSELTEST IMPLICIT NONE REAL:: FNU INTEGER, PARAMETER :: N = 2, KODE = 1 COMPLEX,ALLOCATABLE :: CWRK (:), CY (:) COMPLEX:: Z, ci INTEGER :: NZ, IERR ALLOCATE(CWRK(N), CY(N)) ci = cmplx (0.0, 1.0) FNU = 0.0e0 Z = CMPLX(0.3e0, 0.4e0) CALL CBESY(Z, FNU, KODE, N, CY, NZ, CWRK, IERR) WRITE(*,*) 'CY: ', CY WRITE(*,*) 'IERR: ', IERR STOP END PROGRAM And here is the output of the above program: CY: ( 5.78591091E-39, 5.80327020E-39) ( 0.0000000 , 0.0000000 ) IERR: 4 Ierr = 4 meaning there is some problem with the input itself. To be precise, the IERR = 4 means the following as per the header info in CBESY.f file: ! IERR=4, CABS(Z) OR FNU+N-1 TOO LARGE - NO COMPUTA- ! TION BECAUSE OF COMPLETE LOSSES OF SIGNIFI- ! CANCE BY ARGUMENT REDUCTION Clearly, CABS(Z) (which is 0.50) or FNU + N - 1 (which is 1.0) are not too large but still the routine CBESY throws the error message number 4 as above. The CY array should have following values for the argument given in above code: CY(1) = -0.4983 + 0.6700i CY(2) = -1.0149 + 0.9485i These values are computed using Matlab. I can't figure out what's the problem when I call CBESY from SLATEC library. Any clues? Much thanks for the suggestions/help. PS: if it is of any help, I used gfortran to compile, link and then create the SLATEC library file ( the .a file ) which I keep in the same directory as my test program above. shell command to execute above code: gfortran -c BesselTest.f95 gfortran -o a *.o libslatec.a a GD.

Read the article
New licensing for SQL Server 2012 and #BISM #Tabular usage

- by Marco Russo (SQLBI)

Last week Microsoft announced a new licensing schema for SQL Server 2012. If you are interested in an extensive discussion of the new licensing scheme, Denny Cherry wrote a great blog post about that. I’d like to comment about the new BI Edition license. Teo Lachev already commented about the numbers and I agree with him. I generally like the new licensing mode of SQL 2012. It maintains a very low-entry barrier for SSRS/SSAS/SSIS (Standard Edition). It has a reasonable licensing schema for 20-50...(read more)

Read the article
Oracle 11g Object privileges and synonym usage

Many security breaches occur across user granted privileges and floating unused synonyms. That being said, the question becomes, how can we get a handle on this. Read on to learn more...

Read the article
Automating SQL Execution Plan analysis

- by jchang

Last year, I made my tool for automating execution plan analysis available on www.qdpma.com The original version could parse execution plans from sys.dm_exec_query_stats or dm_exec_cached_plans and generate a cross-reference of which execution plans employed each index. The DMV sys.dm_db_index_usage_stats shows how often each index is used, but not where, that is, which particular stored procedure or My latest version can now also 1) use the DMV sys.dm_exec_procedure_stats, 2) it can also get the...(read more)

Read the article
Title Tag 101 - The Proper Usage and the Importance of the Title Tag

The title tag is one of the most important tags because the search engines use it to describe your page in the search results page. After you press the "search" button you are taken to the listings page with the results.

Read the article
Inverted schedctl usage in the JVM

- by Dave

The schedctl facility in Solaris allows a thread to request that the kernel defer involuntary preemption for a brief period. The mechanism is strictly advisory - the kernel can opt to ignore the request. Schedctl is typically used to bracket lock critical sections. That, in turn, can avoid convoying -- threads piling up on a critical section behind a preempted lock-holder -- and other lock-related performance pathologies. If you're interested see the man pages for schedctl_start() and schedctl_stop() and the schedctl.h include file. The implementation is very efficient. schedctl_start(), which asks that preemption be deferred, simply stores into a thread-specific structure -- the schedctl block -- that the kernel maps into user-space. Similarly, schedctl_stop() clears the flag set by schedctl_stop() and then checks a "preemption pending" flag in the block. Normally, this will be false, but if set schedctl_stop() will yield to politely grant the CPU to other threads. Note that you can't abuse this facility for long-term preemption avoidance as the deferral is brief. If your thread exceeds the grace period the kernel will preempt it and transiently degrade its effective scheduling priority. Further reading : US05937187 and various papers by Andy Tucker. We'll now switch topics to the implementation of the "synchronized" locking construct in the HotSpot JVM. If a lock is contended then on multiprocessor systems we'll spin briefly to try to avoid context switching. Context switching is wasted work and inflicts various cache and TLB penalties on the threads involved. If context switching were "free" then we'd never spin to avoid switching, but that's not the case. We use an adaptive spin-then-park strategy. One potentially undesirable outcome is that we can be preempted while spinning. When our spinning thread is finally rescheduled the lock may or may not be available. If not, we'll spin and then potentially park (block) again, thus suffering a 2nd context switch. Recall that the reason we spin is to avoid context switching. To avoid this scenario I've found it useful to enable schedctl to request deferral while spinning. But while spinning I've arranged for the code to periodically check or poll the "preemption pending" flag. If that's found set we simply abandon our spinning attempt and park immediately. This avoids the double context-switch scenario above. One annoyance is that the schedctl blocks for the threads in a given process are tightly packed on special pages mapped from kernel space into user-land. As such, writes to the schedctl blocks can cause false sharing on other adjacent blocks. Hopefully the kernel folks will make changes to avoid this by padding and aligning the blocks to ensure that one cache line underlies at most one schedctl block at any one time.

Read the article
My Take on Hadoop World 2011

- by Jean-Pierre Dijcks

I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

Read the article
Why is there a large discrepancy between the stackoverflow tag frequency and the TIOBE Index?

- by Lo Sauer

By recently looking at the TIOBE Programming Community Index (Sep 2012) I noticed the following order: C Java Objective-C C++ C# PHP When looking at the tag frequencies of stackoverflow however, the situation is as follows: C# Java PHP JS Android jquery (JS) iphone (Objective-C) C++ (Java takes the lead when accounting for Android tagged posts w/o a Java tag). JavaScript also likely has surpassed PHP in total numbers of programmers? I realize the tag-frequencies may not be the best indicator, but it is likely a sufficient measure nonetheless. What am I missing that explains this discrepancy, especially for ANSI C?

Read the article
Optimizing Memory Usage in a .NET Application with ANTS Memory Profiler

Most people have encountered an OutOfMemory problem at some point or other, and these people know that tracking down the source of the problem is often a time-consuming and frustrating task. Florian Standhartinger gives us a walkthrough of how he used the ANTS Memory Profiler to help make an otherwise painful task that little bit less troublesome.

Read the article

< Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63 | Next Page >