whitepaper - Page 10 - Developer IT

Unable to cast transparent proxy to type <type>

- by Rick Strahl

This is not the first time I've run into this wonderful error while creating new AppDomains in .NET and then trying to load types and access them across App Domains. In almost all cases the problem I've run into with this error the problem comes from the two AppDomains involved loading different copies of the same type. Unless the types match exactly and come exactly from the same assembly the typecast will fail. The most common scenario is that the types are loaded from different assemblies - as unlikely as that sounds. An Example of Failure To give some context, I'm working on some old code in Html Help Builder that creates a new AppDomain in order to parse assembly information for documentation purposes. I create a new AppDomain in order to load up an assembly process it and then immediately unload it along with the AppDomain. The AppDomain allows for unloading that otherwise wouldn't be possible as well as isolating my code from the assembly that's being loaded. The process to accomplish this is fairly established and I use it for lots of applications that use add-in like functionality - basically anywhere where code needs to be isolated and have the ability to be unloaded. My pattern for this is: Create a new AppDomain Load a Factory Class into the AppDomain Use the Factory Class to load additional types from the remote domain Here's the relevant code from my TypeParserFactory that creates a domain and then loads a specific type - TypeParser - that is accessed cross-AppDomain in the parent domain:public class TypeParserFactory : System.MarshalByRefObject,IDisposable { …/// <summary> /// TypeParser Factory method that loads the TypeParser /// object into a new AppDomain so it can be unloaded. /// Creates AppDomain and creates type. /// </summary> /// <returns></returns> public TypeParser CreateTypeParser() { if (!CreateAppDomain(null)) return null; /// Create the instance inside of the new AppDomain /// Note: remote domain uses local EXE's AppBasePath!!! TypeParser parser = null; try { Assembly assembly = Assembly.GetExecutingAssembly(); string assemblyPath = Assembly.GetExecutingAssembly().Location; parser = (TypeParser) this.LocalAppDomain.CreateInstanceFrom(assemblyPath, typeof(TypeParser).FullName).Unwrap(); } catch (Exception ex) { this.ErrorMessage = ex.GetBaseException().Message; return null; } return parser; } private bool CreateAppDomain(string lcAppDomain) { if (lcAppDomain == null) lcAppDomain = "wwReflection" + Guid.NewGuid().ToString().GetHashCode().ToString("x"); AppDomainSetup setup = new AppDomainSetup(); // *** Point at current directory setup.ApplicationBase = AppDomain.CurrentDomain.BaseDirectory; //setup.PrivateBinPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "bin"); this.LocalAppDomain = AppDomain.CreateDomain(lcAppDomain,null,setup); // Need a custom resolver so we can load assembly from non current path AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve); return true; } …} Note that the classes must be either [Serializable] (by value) or inherit from MarshalByRefObject in order to be accessible remotely. Here I need to call methods on the remote object so all classes are MarshalByRefObject. The specific problem code is the loading up a new type which points at an assembly that visible both in the current domain and the remote domain and then instantiates a type from it. This is the code in question:Assembly assembly = Assembly.GetExecutingAssembly(); string assemblyPath = Assembly.GetExecutingAssembly().Location; parser = (TypeParser) this.LocalAppDomain.CreateInstanceFrom(assemblyPath, typeof(TypeParser).FullName).Unwrap(); The last line of code is what blows up with the Unable to cast transparent proxy to type <type> error. Without the cast the code actually returns a TransparentProxy instance, but the cast is what blows up. In other words I AM in fact getting a TypeParser instance back but it can't be cast to the TypeParser type that is loaded in the current AppDomain. Finding the Problem To see what's going on I tried using the .NET 4.0 dynamic type on the result and lo and behold it worked with dynamic - the value returned is actually a TypeParser instance: Assembly assembly = Assembly.GetExecutingAssembly(); string assemblyPath = Assembly.GetExecutingAssembly().Location; object objparser = this.LocalAppDomain.CreateInstanceFrom(assemblyPath, typeof(TypeParser).FullName).Unwrap(); // dynamic works dynamic dynParser = objparser; string info = dynParser.GetVersionInfo(); // method call works // casting fails parser = (TypeParser)objparser; So clearly a TypeParser type is coming back, but nevertheless it's not the right one. Hmmm… mysterious.Another couple of tries reveal the problem however:// works dynamic dynParser = objparser; string info = dynParser.GetVersionInfo(); // method call works // c:\wwapps\wwhelp\wwReflection20.dll (Current Execution Folder) string info3 = typeof(TypeParser).Assembly.CodeBase; // c:\program files\vfp9\wwReflection20.dll (my COM client EXE's folder) string info4 = dynParser.GetType().Assembly.CodeBase; // fails parser = (TypeParser)objparser; As you can see the second value is coming from a totally different assembly. Note that this is even though I EXPLICITLY SPECIFIED an assembly path to load the assembly from! Instead .NET decided to load the assembly from the original ApplicationBase folder. Ouch! How I actually tracked this down was a little more tedious: I added a method like this to both the factory and the instance types and then compared notes:public string GetVersionInfo() { return ".NET Version: " + Environment.Version.ToString() + "\r\n" + "wwReflection Assembly: " + typeof(TypeParserFactory).Assembly.CodeBase.Replace("file:///", "").Replace("/", "\\") + "\r\n" + "Assembly Cur Dir: " + Directory.GetCurrentDirectory() + "\r\n" + "ApplicationBase: " + AppDomain.CurrentDomain.SetupInformation.ApplicationBase + "\r\n" + "App Domain: " + AppDomain.CurrentDomain.FriendlyName + "\r\n"; } For the factory I got: .NET Version: 4.0.30319.239wwReflection Assembly: c:\wwapps\wwhelp\bin\wwreflection20.dllAssembly Cur Dir: c:\wwapps\wwhelpApplicationBase: C:\Programs\vfp9\App Domain: wwReflection534cfa1f For the instance type I got: .NET Version: 4.0.30319.239wwReflection Assembly: C:\\Programs\\vfp9\wwreflection20.dllAssembly Cur Dir: c:\\wwapps\\wwhelpApplicationBase: C:\\Programs\\vfp9\App Domain: wwDotNetBridge_56006605 which clearly shows the problem. You can see that both are loading from different appDomains but the each is loading the assembly from a different location. Probably a better solution yet (for ANY kind of assembly loading problem) is to use the .NET Fusion Log Viewer to trace assembly loads.The Fusion viewer will show a load trace for each assembly loaded and where it's looking to find it. Here's what the viewer looks like: The last trace above that I found for the second wwReflection20 load (the one that is wonky) looks like this:*** Assembly Binder Log Entry (1/13/2012 @ 3:06:49 AM) *** The operation was successful. Bind result: hr = 0x0. The operation completed successfully. Assembly manager loaded from: C:\Windows\Microsoft.NET\Framework\V4.0.30319\clr.dll Running under executable c:\programs\vfp9\vfp9.exe --- A detailed error log follows. === Pre-bind state information === LOG: User = Ras\ricks LOG: DisplayName = wwReflection20, Version=4.61.0.0, Culture=neutral, PublicKeyToken=null (Fully-specified) LOG: Appbase = file:///C:/Programs/vfp9/ LOG: Initial PrivatePath = NULL LOG: Dynamic Base = NULL LOG: Cache Base = NULL LOG: AppName = vfp9.exe Calling assembly : (Unknown). === LOG: This bind starts in default load context. LOG: Using application configuration file: C:\Programs\vfp9\vfp9.exe.Config LOG: Using host configuration file: LOG: Using machine configuration file from C:\Windows\Microsoft.NET\Framework\V4.0.30319\config\machine.config. LOG: Policy not being applied to reference at this time (private, custom, partial, or location-based assembly bind). LOG: Attempting download of new URL file:///C:/Programs/vfp9/wwReflection20.DLL. LOG: Assembly download was successful. Attempting setup of file: C:\Programs\vfp9\wwReflection20.dll LOG: Entering run-from-source setup phase. LOG: Assembly Name is: wwReflection20, Version=4.61.0.0, Culture=neutral, PublicKeyToken=null LOG: Binding succeeds. Returns assembly from C:\Programs\vfp9\wwReflection20.dll. LOG: Assembly is loaded in default load context. WRN: The same assembly was loaded into multiple contexts of an application domain: WRN: Context: Default | Domain ID: 2 | Assembly Name: wwReflection20, Version=4.61.0.0, Culture=neutral, PublicKeyToken=null WRN: Context: LoadFrom | Domain ID: 2 | Assembly Name: wwReflection20, Version=4.61.0.0, Culture=neutral, PublicKeyToken=null WRN: This might lead to runtime failures. WRN: It is recommended to inspect your application on whether this is intentional or not. WRN: See whitepaper http://go.microsoft.com/fwlink/?LinkId=109270 for more information and common solutions to this issue. Notice that the fusion log clearly shows that the .NET loader makes no attempt to even load the assembly from the path I explicitly specified. Remember your Assembly Locations As mentioned earlier all failures I've seen like this ultimately resulted from different versions of the same type being available in the two AppDomains. At first sight that seems ridiculous - how could the types be different and why would you have multiple assemblies - but there are actually a number of scenarios where it's quite possible to have multiple copies of the same assembly floating around in multiple places. If you're hosting different environments (like hosting the Razor Engine, or ASP.NET Runtime for example) it's common to create a private BIN folder and it's important to make sure that there's no overlap of assemblies. In my case of Html Help Builder the problem started because I'm using COM interop to access the .NET assembly and the above code. COM Interop has very specific requirements on where assemblies can be found and because I was mucking around with the loader code today, I ended up moving assemblies around to a new location for explicit loading. The explicit load works in the main AppDomain, but failed in the remote domain as I showed. The solution here was simple enough: Delete the extraneous assembly which was left around by accident. Not a common problem, but one that when it bites is pretty nasty to figure out because it seems so unlikely that types wouldn't match. I know I've run into this a few times and writing this down hopefully will make me remember in the future rather than poking around again for an hour trying to debug the issue as I did today. Hopefully it'll save some of you some time as well in the future.© Rick Strahl, West Wind Technologies, 2005-2012Posted in .NET COM Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

EM12c Release 4: Database as a Service Enhancements

- by Adeesh Fulay

Oracle Enterprise Manager 12.1.0.4 (or simply put EM12c R4) is the latest update to the product. As previous versions, this release provides tons of enhancements and bug fixes, attributing to improved stability and quality. One of the areas that is most exciting and has seen tremendous growth in the last few years is that of Database as a Service. EM12c R4 provides a significant update to Database as a Service. The key themes are: Comprehensive Database Service Catalog (includes single instance, RAC, and Data Guard) Additional Storage Options for Snap Clone (includes support for Database feature CloneDB) Improved Rapid Start Kits Extensible Metering and Chargeback Miscellaneous Enhancements 1. Comprehensive Database Service Catalog Before we get deep into implementation of a service catalog, lets first understand what it is and what benefits it provides. Per ITIL, a service catalog is an exhaustive list of IT services that an organization provides or offers to its employees or customers. Service catalogs have been widely popular in the space of cloud computing, primarily as the medium to provide standardized and pre-approved service definitions. There is already some good collateral out there that talks about Oracle database service catalogs. The two whitepapers i recommend reading are: Service Catalogs: Defining Standardized Database Service High Availability Best Practices for Database Consolidation: The Foundation for Database as a Service [Oracle MAA] EM12c comes with an out-of-the-box service catalog and self service portal since release 1. For the customers, it provides the following benefits: Present a collection of standardized database service definitions, Define standardized pools of hardware and software for provisioning, Role based access to cater to different class of users, Automated procedures to provision the predefined database definitions, Setup chargeback plans based on service tiers and database configuration sizes, etc Starting Release 4, the scope of services offered via the service catalog has been expanded to include databases with varying levels of availability - Single Instance (SI) or Real Application Clusters (RAC) databases with multiple data guard based standby databases. Some salient points of the data guard integration: Standby pools can now be defined across different datacenters or within the same datacenter as the primary (this helps in modelling the concept of near and far DR sites) The standby databases can be single instance, RAC, or RAC One Node databases Multiple standby databases can be provisioned, where the maximum limit is determined by the version of database software The standby databases can be in either mount or read only (requires active data guard option) mode All database versions 10g to 12c supported (as certified with EM 12c) All 3 protection modes can be used - Maximum availability, performance, security Log apply can be set to sync or async along with the required apply lag The different service levels or service tiers are popularly represented using metals - Platinum, Gold, Silver, Bronze, and so on. The Oracle MAA whitepaper (referenced above) calls out the various service tiers as defined by Oracle's best practices, but customers can choose any logical combinations from the table below: Primary Standby [1 or more] EM 12cR4 SI - SI SI RAC - RAC SI RAC RAC RON - RON RON where RON = RAC One Node is supported via custom post-scripts in the service template A sample service catalog would look like the image below. Here we have defined 4 service levels, which have been deployed across 2 data centers, and have 3 standardized sizes. Again, it is important to note that this is just an example to get the creative juices flowing. I imagine each customer would come up with their own catalog based on the application requirements, their RTO/RPO goals, and the product licenses they own. In the screenwatch titled 'Build Service Catalog using EM12c DBaaS', I walk through the complete steps required to setup this sample service catalog in EM12c. 2. Additional Storage Options for Snap Clone In my previous blog posts, i have described the snap clone feature in detail. Essentially, it provides a storage agnostic, self service, rapid, and space efficient approach to solving your data cloning problems. The net benefit is that you get incredible amounts of storage savings (on average 90%) all while cloning databases in a matter of minutes. Space and Time, two things enterprises would love to save on. This feature has been designed with the goal of providing data cloning capabilities while protecting your existing investments in server, storage, and software. With this in mind, we have pursued with the dual solution approach of Hardware and Software. In the hardware approach, we connect directly to your storage appliances and perform all low level actions required to rapidly clone your databases. While in the software approach, we use an intermediate software layer to talk to any storage vendor or any storage configuration to perform the same low level actions. Thus delivering the benefits of database thin cloning, without requiring you to drastically changing the infrastructure or IT's operating style. In release 4, we expand the scope of options supported by snap clone with the addition of database CloneDB. While CloneDB is not a new feature, it was first introduced in 11.2.0.2 patchset, it has over the years become more stable and mature. CloneDB leverages a combination of Direct NFS (or dNFS) feature of the database, RMAN image copies, sparse files, and copy-on-write technology to create thin clones of databases from existing backups in a matter of minutes. It essentially has all the traits that we want to present to our customers via the snap clone feature. For more information on cloneDB, i highly recommend reading the following sources: Blog by Tim Hall: Direct NFS (DNFS) CloneDB in Oracle Database 11g Release 2 Oracle OpenWorld Presentation by Cern: Efficient Database Cloning using Direct NFS and CloneDB The advantages of the new CloneDB integration with EM12c Snap Clone are: Space and time savings Ease of setup - no additional software is required other than the Oracle database binary Works on all platforms Reduce the dependence on storage administrators Cloning process fully orchestrated by EM12c, and delivered to developers/DBAs/QA Testers via the self service portal Uses dNFS to delivers better performance, availability, and scalability over kernel NFS Complete lifecycle of the clones managed by EM12c - performance, configuration, etc 3. Improved Rapid Start Kits DBaaS deployments tend to be complex and its setup requires a series of steps. These steps are typically performed across different users and different UIs. The Rapid Start Kit provides a single command solution to setup Database as a Service (DBaaS) and Pluggable Database as a Service (PDBaaS). One command creates all the Cloud artifacts like Roles, Administrators, Credentials, Database Profiles, PaaS Infrastructure Zone, Database Pools and Service Templates. Once the Rapid Start Kit has been successfully executed, requests can be made to provision databases and PDBs from the self service portal. Rapid start kit can create complex topologies involving multiple zones, pools and service templates. It also supports standby databases and use of RMAN image backups. The Rapid Start Kit in reality is a simple emcli script which takes a bunch of xml files as input and executes the complete automation in a matter of seconds. On a full rack Exadata, it took only 40 seconds to setup PDBaaS end-to-end. This kit works for both Oracle's engineered systems like Exadata, SuperCluster, etc and also on commodity hardware. One can draw parallel to the Exadata One Command script, which again takes a bunch of inputs from the administrators and then runs a simple script that configures everything from network to provisioning the DB software. Steps to use the kit: The kit can be found under the SSA plug-in directory on the OMS: EM_BASE/oracle/MW/plugins/oracle.sysman.ssa.oms.plugin_12.1.0.8.0/dbaas/setup It can be run from this default location or from any server which has emcli client installed For most scenarios, you would use the script dbaas/setup/database_cloud_setup.py For Exadata, special integration is provided to reduce the number of inputs even further. The script to use for this scenario would be dbaas/setup/exadata_cloud_setup.py The database_cloud_setup.py script takes two inputs: Cloud boundary xml: This file defines the cloud topology in terms of the zones and pools along with host names, oracle home locations or container database names that would be used as infrastructure for provisioning database services. This file is optional in case of Exadata, as the boundary is well know via the Exadata system target available in EM. Input xml: This file captures inputs for users, roles, profiles, service templates, etc. Essentially, all inputs required to define the DB services and other settings of the self service portal. Once all the xml files have been prepared, invoke the script as follows for PDBaaS: emcli @database_cloud_setup.py -pdbaas -cloud_boundary=/tmp/my_boundary.xml -cloud_input=/tmp/pdb_inputs.xml The script will prompt for passwords a few times for key users like sysman, cloud admin, SSA admin, etc. Once complete, you can simply log into EM as the self service user and request for databases from the portal. More information available in the Rapid Start Kit chapter in Cloud Administration Guide. 4. Extensible Metering and Chargeback Last but not the least, Metering and Chargeback in release 4 has been made extensible in all possible regards. The new extensibility features allow customer, partners, system integrators, etc to : Extend chargeback to any target type managed in EM Promote any metric in EM as a chargeback entity Extend list of charge items via metric or configuration extensions Model abstract entities like no. of backup requests, job executions, support requests, etc A slew of emcli verbs have also been added that allows administrators to create, edit, delete, import/export charge plans, and assign cost centers all via the command line. More information available in the Chargeback API chapter in Cloud Administration Guide. 5. Miscellaneous Enhancements There are other miscellaneous, yet important, enhancements that are worth a mention. These mostly have been asked by customers like you. These are: Custom naming of DB Services Self service users can provide custom names for DB SID, DB service, schemas, and tablespaces Every custom name is validated for uniqueness in EM 'Create like' of Service Templates Now creating variants of a service template is only a click away. This would be vital when you publish service templates to represent different database sizes or service levels. Profile viewer View the details of a profile like datafile, control files, snapshot ids, export/import files, etc prior to its selection in the service template Cleanup automation - for failed and successful requests Single emcli command to cleanup all remnant artifacts of a failed request Cleanup can be performed on a per request bases or by the entire pool As an extension, you can also delete successful requests Improved delete user workflow Allows administrators to reassign cloud resources to another user or delete all of them Support for multiple tablespaces for schema as a service In addition to multiple schemas, user can also specify multiple tablespaces per request I hope this was a good introduction to the new Database as a Service enhancements in EM12c R4. I encourage you to explore many of these new and existing features and give us feedback. Good luck! References: Cloud Management Page on OTN Cloud Administration Guide [Documentation] -- Adeesh Fulay (@adeeshf)

Read the article

How Mature is Your Database Change Management Process?

- by Ben Rees

.dbd-banner p{ font-size:0.75em; padding:0 0 10px; margin:0 } .dbd-banner p span{ color:#675C6D; } .dbd-banner p:last-child{ padding:0; } @media ALL and (max-width:640px){ .dbd-banner{ background:#f0f0f0; padding:5px; color:#333; margin-top: 5px; } } -- Database Delivery Patterns & Practices Further Reading Organization and team processes How do you get your database schema changes live, on to your production system? As your team of developers and DBAs are working on the changes to the database to support your business-critical applications, how do these updates wend their way through from dev environments, possibly to QA, hopefully through pre-production and eventually to production in a controlled, reliable and repeatable way? In this article, I describe a model we use to try and understand the different stages that customers go through as their database change management processes mature, from the very basic and manual, through to advanced continuous delivery practices. I also provide a simple chart that will help you determine “How mature is our database change management process?” This process of managing changes to the database – which all of us who have worked in application/database development have had to deal with in one form or another – is sometimes known as Database Change Management (even if we’ve never used the term ourselves). And it’s a difficult process, often painfully so. Some developers take the approach of “I’ve no idea how my changes get live – I just write the stored procedures and add columns to the tables. It’s someone else’s problem to get this stuff live. I think we’ve got a DBA somewhere who deals with it – I don’t know, I’ve never met him/her”. I know I used to work that way. I worked that way because I assumed that making the updates to production was a trivial task – how hard can it be? Pause the application for half an hour in the middle of the night, copy over the changes to the app and the database, and switch it back on again? Voila! But somehow it never seemed that easy. And it certainly was never that easy for database changes. Why? Because you can’t just overwrite the old database with the new version. Databases have a state – more specifically 4Tb of critical data built up over the last 12 years of running your business, and if your quick hotfix happened to accidentally delete that 4Tb of data, then you’re “Looking for a new role” pretty quickly after the failed release. There are a lot of other reasons why a managed database change management process is important for organisations, besides job security, not least: Frequency of releases. Many business managers are feeling the pressure to get functionality out to their users sooner, quicker and more reliably. The new book (which I highly recommend) Lean Enterprise by Jez Humble, Barry O’Reilly and Joanne Molesky provides a great discussion on how many enterprises are having to move towards a leaner, more frequent release cycle to maintain their competitive advantage. It’s no longer acceptable to release once per year, leaving your customers waiting all year for changes they desperately need (and expect) Auditing and compliance. SOX, HIPAA and other compliance frameworks have demanded that companies implement proper processes for managing changes to their databases, whether managing schema changes, making sure that the data itself is being looked after correctly or other mechanisms that provide an audit trail of changes. We’ve found, at Red Gate that we have a very wide range of customers using every possible form of database change management imaginable. Everything from “Nothing – I just fix the schema on production from my laptop when things go wrong, and write it down in my notebook” to “A full Continuous Delivery process – any change made by a dev gets checked in and recorded, fully tested (including performance tests) before a (tested) release is made available to our Release Management system, ready for live deployment!”. And everything in between of course. Because of the vast number of customers using so many different approaches we found ourselves struggling to keep on top of what everyone was doing – struggling to identify patterns in customers’ behavior. This is useful for us, because we want to try and fit the products we have to different needs – different products are relevant to different customers and we waste everyone’s time (most notably, our customers’) if we’re suggesting products that aren’t appropriate for them. If someone visited a sports store, looking to embark on a new fitness program, and the store assistant suggested the latest $10,000 multi-gym, complete with multiple weights mechanisms, dumb-bells, pull-up bars and so on, then he’s likely to lose that customer. All he needed was a pair of running shoes! To solve this issue – in an attempt to simplify how we understand our customers and our offerings – we built a model. This is a an attempt at trying to classify our customers in to some sort of model or “Customer Maturity Framework” as we rather grandly term it, which somehow simplifies our understanding of what our customers are doing. The great statistician, George Box (amongst other things, the “Box” in the Box-Jenkins time series model) gave us the famous quote: “Essentially all models are wrong, but some are useful” We’ve taken this quote to heart – we know it’s a gross over-simplification of the real world of how users work with complex legacy and new database developments. Almost nobody precisely fits in to one of our categories. But we hope it’s useful and interesting. There are actually a number of similar models that exist for more general application delivery. We’ve found these from ThoughtWorks/Forrester, from InfoQ and others, and initially we tried just taking these models and replacing the word “application” for “database”. However, we hit a problem. From talking to our customers we know that users are far less further down the road of mature database change management than they are for application development. As a simple example, no application developer, who wants to keep his/her job would develop an application for an organisation without source controlling that code. Sure, he/she might not be using an advanced Gitflow branching methodology but they’ll certainly be making sure their code gets managed in a repo somewhere with all the benefits of history, auditing and so on. But this certainly isn’t the case (yet) for the database – a very large segment of the people we speak to have no source control set up for their databases whatsoever, even at the most basic level (for example, keeping change scripts in a source control system somewhere). By the way, if this is you, Red Gate has a great whitepaper here, on the barriers people face getting a source control process implemented at their organisations. This difference in maturity is the same as you move in to areas such as continuous integration (common amongst app developers, relatively rare for database developers) and automated release management (growing amongst app developers, very rare for the database). So, when we created the model we started from scratch and biased the levels of maturity towards what we actually see amongst our customers. But, what are these stages? And what level are you? The table below describes our definitions for four levels of maturity – Baseline, Beginner, Intermediate and Advanced. As I say, this is a model – you won’t fit any of these categories perfectly, but hopefully one will ring true more than others. We’ve also created a PDF with a flow chart to help you find which of these groups most closely matches your team: Download the Database Delivery Maturity Framework PDF here Level D1 – Baseline Work directly on live databases Sometimes work directly in production Generate manual scripts for releases. Sometimes use a product like SQL Compare or similar to do this Any tests that we might have are run manually Level D2 – Beginner Have some ad-hoc DB version control such as manually adding upgrade scripts to a version control system Attempt is made to keep production in sync with development environments There is some documentation and planning of manual deployments Some basic automated DB testing in process Level D3 – Intermediate The database is fully version-controlled with a product like Red Gate SQL Source Control or SSDT Database environments are managed Production environment schema is reproducible from the source control system There are some automated tests Have looked at using migration scripts for difficult database refactoring cases Level D4 – Advanced Using continuous integration for database changes Build, testing and deployment of DB changes carried out through a proper database release process Fully automated tests Production system is monitored for fast feedback to developers Does this model reflect your team at all? Where are you on this journey? We’d be very interested in knowing how you get on. We’re doing a lot of work at the moment, at Red Gate, trying to help people progress through these stages. For example, if you’re currently not source controlling your database, then this is a natural next step. If you are already source controlling your database, what about the next stage – continuous integration and automated release management? To help understand these issues, there’s a summary of the Red Gate Database Delivery learning program on our site, alongside a Patterns and Practices library here on Simple-Talk and a Training Academy section on our documentation site to help you get up and running with the tools you need to progress. All feedback is welcome and it would be great to hear where you find yourself on this journey! This article is part of our database delivery patterns & practices series on Simple Talk. Find more articles for version control, automated testing, continuous integration & deployment.

Read the article

C#/.NET Little Wonders: ConcurrentBag and BlockingCollection

- by James Michael Hare

In the first week of concurrent collections, began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>. The last post discussed the ConcurrentDictionary<T> . Finally this week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see C#/.NET Little Wonders: A Redux. Recap As you'll recall from the previous posts, the original collections were object-based containers that accomplished synchronization through a Synchronized member. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization. With .NET 4.0, a new breed of collections was born in the System.Collections.Concurrent namespace. Of these, the final concurrent collection we will examine is the ConcurrentBag and a very useful wrapper class called the BlockingCollection. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this informative whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentBag<T> – Thread-safe unordered collection. Unlike the other concurrent collections, the ConcurrentBag<T> has no non-concurrent counterpart in the .NET collections libraries. Items can be added and removed from a bag just like any other collection, but unlike the other collections, the items are not maintained in any order. This makes the bag handy for those cases when all you care about is that the data be consumed eventually, without regard for order of consumption or even fairness – that is, it’s possible new items could be consumed before older items given the right circumstances for a period of time. So why would you ever want a container that can be unfair? Well, to look at it another way, you can use a ConcurrentQueue and get the fairness, but it comes at a cost in that the ordering rules and synchronization required to maintain that ordering can affect scalability a bit. Thus sometimes the bag is great when you want the fastest way to get the next item to process, and don’t care what item it is or how long its been waiting. The way that the ConcurrentBag works is to take advantage of the new ThreadLocal<T> type (new in System.Threading for .NET 4.0) so that each thread using the bag has a list local to just that thread. This means that adding or removing to a thread-local list requires very low synchronization. The problem comes in where a thread goes to consume an item but it’s local list is empty. In this case the bag performs “work-stealing” where it will rob an item from another thread that has items in its list. This requires a higher level of synchronization which adds a bit of overhead to the take operation. So, as you can imagine, this makes the ConcurrentBag good for situations where each thread both produces and consumes items from the bag, but it would be less-than-idea in situations where some threads are dedicated producers and the other threads are dedicated consumers because the work-stealing synchronization would outweigh the thread-local optimization for a thread taking its own items. Like the other concurrent collections, there are some curiosities to keep in mind: IsEmpty(), Count, ToArray(), and GetEnumerator() lock collection Each of these needs to take a snapshot of whole bag to determine if empty, thus they tend to be more expensive and cause Add() and Take() operations to block. ToArray() and GetEnumerator() are static snapshots Because it is based on a snapshot, will not show subsequent updates after snapshot. Add() is lightweight Since adding to the thread-local list, there is very little overhead on Add. TryTake() is lightweight if items in thread-local list As long as items are in the thread-local list, TryTake() is very lightweight, much more so than ConcurrentStack() and ConcurrentQueue(), however if the local thread list is empty, it must steal work from another thread, which is more expensive. Remember, a bag is not ideal for all situations, it is mainly ideal for situations where a process consumes an item and either decomposes it into more items to be processed, or handles the item partially and places it back to be processed again until some point when it will complete. The main point is that the bag works best when each thread both takes and adds items. For example, we could create a totally contrived example where perhaps we want to see the largest power of a number before it crosses a certain threshold. Yes, obviously we could easily do this with a log function, but bare with me while I use this contrived example for simplicity. So let’s say we have a work function that will take a Tuple out of a bag, this Tuple will contain two ints. The first int is the original number, and the second int is the last multiple of that number. So we could load our bag with the initial values (let’s say we want to know the last multiple of each of 2, 3, 5, and 7 under 100. 1: var bag = new ConcurrentBag<Tuple<int, int>> 2: { 3: Tuple.Create(2, 1), 4: Tuple.Create(3, 1), 5: Tuple.Create(5, 1), 6: Tuple.Create(7, 1) 7: }; Then we can create a method that given the bag, will take out an item, apply the multiplier again, 1: public static void FindHighestPowerUnder(ConcurrentBag<Tuple<int,int>> bag, int threshold) 2: { 3: Tuple<int,int> pair; 4: 5: // while there are items to take, this will prefer local first, then steal if no local 6: while (bag.TryTake(out pair)) 7: { 8: // look at next power 9: var result = Math.Pow(pair.Item1, pair.Item2 + 1); 10: 11: if (result < threshold) 12: { 13: // if smaller than threshold bump power by 1 14: bag.Add(Tuple.Create(pair.Item1, pair.Item2 + 1)); 15: } 16: else 17: { 18: // otherwise, we're done 19: Console.WriteLine("Highest power of {0} under {3} is {0}^{1} = {2}.", 20: pair.Item1, pair.Item2, Math.Pow(pair.Item1, pair.Item2), threshold); 21: } 22: } 23: } Now that we have this, we can load up this method as an Action into our Tasks and run it: 1: // create array of tasks, start all, wait for all 2: var tasks = new[] 3: { 4: new Task(() => FindHighestPowerUnder(bag, 100)), 5: new Task(() => FindHighestPowerUnder(bag, 100)), 6: }; 7: 8: Array.ForEach(tasks, t => t.Start()); 9: 10: Task.WaitAll(tasks); Totally contrived, I know, but keep in mind the main point! When you have a thread or task that operates on an item, and then puts it back for further consumption – or decomposes an item into further sub-items to be processed – you should consider a ConcurrentBag as the thread-local lists will allow for quick processing. However, if you need ordering or if your processes are dedicated producers or consumers, this collection is not ideal. As with anything, you should performance test as your mileage will vary depending on your situation! BlockingCollection<T> – A producers & consumers pattern collection The BlockingCollection<T> can be treated like a collection in its own right, but in reality it adds a producers and consumers paradigm to any collection that implements the interface IProducerConsumerCollection<T>. If you don’t specify one at the time of construction, it will use a ConcurrentQueue<T> as its underlying store. If you don’t want to use the ConcurrentQueue, the ConcurrentStack and ConcurrentBag also implement the interface (though ConcurrentDictionary does not). In addition, you are of course free to create your own implementation of the interface. So, for those who don’t remember the producers and consumers classical computer-science problem, the gist of it is that you have one (or more) processes that are creating items (producers) and one (or more) processes that are consuming these items (consumers). Now, the crux of the problem is that there is a bin (queue) where the produced items are placed, and typically that bin has a limited size. Thus if a producer creates an item, but there is no space to store it, it must wait until an item is consumed. Also if a consumer goes to consume an item and none exists, it must wait until an item is produced. The BlockingCollection makes it trivial to implement any standard producers/consumers process set by providing that “bin” where the items can be produced into and consumed from with the appropriate blocking operations. In addition, you can specify whether the bin should have a limited size or can be (theoretically) unbounded, and you can specify timeouts on the blocking operations. As far as your choice of “bin”, for the most part the ConcurrentQueue is the right choice because it is fairly light and maximizes fairness by ordering items so that they are consumed in the same order they are produced. You can use the concurrent bag or stack, of course, but your ordering would be random-ish in the case of the former and LIFO in the case of the latter. So let’s look at some of the methods of note in BlockingCollection: BoundedCapacity returns capacity of the “bin” If the bin is unbounded, the capacity is int.MaxValue. Count returns an internally-kept count of items This makes it O(1), but if you modify underlying collection directly (not recommended) it is unreliable. CompleteAdding() is used to cut off further adds. This sets IsAddingCompleted and begins to wind down consumers once empty. IsAddingCompleted is true when producers are “done”. Once you are done producing, should complete the add process to alert consumers. IsCompleted is true when producers are “done” and “bin” is empty. Once you mark the producers done, and all items removed, this will be true. Add() is a blocking add to collection. If bin is full, will wait till space frees up Take() is a blocking remove from collection. If bin is empty, will wait until item is produced or adding is completed. GetConsumingEnumerable() is used to iterate and consume items. Unlike the standard enumerator, this one consumes the items instead of iteration. TryAdd() attempts add but does not block completely If adding would block, returns false instead, can specify TimeSpan to wait before stopping. TryTake() attempts to take but does not block completely Like TryAdd(), if taking would block, returns false instead, can specify TimeSpan to wait. Note the use of CompleteAdding() to signal the BlockingCollection that nothing else should be added. This means that any attempts to TryAdd() or Add() after marked completed will throw an InvalidOperationException. In addition, once adding is complete you can still continue to TryTake() and Take() until the bin is empty, and then Take() will throw the InvalidOperationException and TryTake() will return false. So let’s create a simple program to try this out. Let’s say that you have one process that will be producing items, but a slower consumer process that handles them. This gives us a chance to peek inside what happens when the bin is bounded (by default, the bin is NOT bounded). 1: var bin = new BlockingCollection<int>(5); Now, we create a method to produce items: 1: public static void ProduceItems(BlockingCollection<int> bin, int numToProduce) 2: { 3: for (int i = 0; i < numToProduce; i++) 4: { 5: // try for 10 ms to add an item 6: while (!bin.TryAdd(i, TimeSpan.FromMilliseconds(10))) 7: { 8: Console.WriteLine("Bin is full, retrying..."); 9: } 10: } 11: 12: // once done producing, call CompleteAdding() 13: Console.WriteLine("Adding is completed."); 14: bin.CompleteAdding(); 15: } And one to consume them: 1: public static void ConsumeItems(BlockingCollection<int> bin) 2: { 3: // This will only be true if CompleteAdding() was called AND the bin is empty. 4: while (!bin.IsCompleted) 5: { 6: int item; 7: 8: if (!bin.TryTake(out item, TimeSpan.FromMilliseconds(10))) 9: { 10: Console.WriteLine("Bin is empty, retrying..."); 11: } 12: else 13: { 14: Console.WriteLine("Consuming item {0}.", item); 15: Thread.Sleep(TimeSpan.FromMilliseconds(20)); 16: } 17: } 18: } Then we can fire them off: 1: // create one producer and two consumers 2: var tasks = new[] 3: { 4: new Task(() => ProduceItems(bin, 20)), 5: new Task(() => ConsumeItems(bin)), 6: new Task(() => ConsumeItems(bin)), 7: }; 8: 9: Array.ForEach(tasks, t => t.Start()); 10: 11: Task.WaitAll(tasks); Notice that the producer is faster than the consumer, thus it should be hitting a full bin often and displaying the message after it times out on TryAdd(). 1: Consuming item 0. 2: Consuming item 1. 3: Bin is full, retrying... 4: Bin is full, retrying... 5: Consuming item 3. 6: Consuming item 2. 7: Bin is full, retrying... 8: Consuming item 4. 9: Consuming item 5. 10: Bin is full, retrying... 11: Consuming item 6. 12: Consuming item 7. 13: Bin is full, retrying... 14: Consuming item 8. 15: Consuming item 9. 16: Bin is full, retrying... 17: Consuming item 10. 18: Consuming item 11. 19: Bin is full, retrying... 20: Consuming item 12. 21: Consuming item 13. 22: Bin is full, retrying... 23: Bin is full, retrying... 24: Consuming item 14. 25: Adding is completed. 26: Consuming item 15. 27: Consuming item 16. 28: Consuming item 17. 29: Consuming item 19. 30: Consuming item 18. Also notice that once CompleteAdding() is called and the bin is empty, the IsCompleted property returns true, and the consumers will exit. Summary The ConcurrentBag is an interesting collection that can be used to optimize concurrency scenarios where tasks or threads both produce and consume items. In this way, it will choose to consume its own work if available, and then steal if not. However, in situations where you want fair consumption or ordering, or in situations where the producers and consumers are distinct processes, the bag is not optimal. The BlockingCollection is a great wrapper around all of the concurrent queue, stack, and bag that allows you to add producer and consumer semantics easily including waiting when the bin is full or empty. That’s the end of my dive into the concurrent collections. I’d also strongly recommend, once again, you read this excellent Microsoft white paper that goes into much greater detail on the efficiencies you can gain using these collections judiciously (here). Tweet Technorati Tags: C#,.NET,Concurrent Collections,Little Wonders

Read the article

C#/.NET Little Wonders: The Concurrent Collections (1 of 3)

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. In the next few weeks, we will discuss the concurrent collections and how they have changed the face of concurrent programming. This week’s post will begin with a general introduction and discuss the ConcurrentStack<T> and ConcurrentQueue<T>. Then in the following post we’ll discuss the ConcurrentDictionary<T> and ConcurrentBag<T>. Finally, we shall close on the third post with a discussion of the BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. A brief history of collections In the beginning was the .NET 1.0 Framework. And out of this framework emerged the System.Collections namespace, and it was good. It contained all the basic things a growing programming language needs like the ArrayList and Hashtable collections. The main problem, of course, with these original collections is that they held items of type object which means you had to be disciplined enough to use them correctly or you could end up with runtime errors if you got an object of a type you weren't expecting. Then came .NET 2.0 and generics and our world changed forever! With generics the C# language finally got an equivalent of the very powerful C++ templates. As such, the System.Collections.Generic was born and we got type-safe versions of all are favorite collections. The List<T> succeeded the ArrayList and the Dictionary<TKey,TValue> succeeded the Hashtable and so on. The new versions of the library were not only safer because they checked types at compile-time, in many cases they were more performant as well. So much so that it's Microsoft's recommendation that the System.Collections original collections only be used for backwards compatibility. So we as developers came to know and love the generic collections and took them into our hearts and embraced them. The problem is, thread safety in both the original collections and the generic collections can be problematic, for very different reasons. Now, if you are only doing single-threaded development you may not care – after all, no locking is required. Even if you do have multiple threads, if a collection is “load-once, read-many” you don’t need to do anything to protect that container from multi-threaded access, as illustrated below: 1: public static class OrderTypeTranslator 2: { 3: // because this dictionary is loaded once before it is ever accessed, we don't need to synchronize 4: // multi-threaded read access 5: private static readonly Dictionary<string, char> _translator = new Dictionary<string, char> 6: { 7: {"New", 'N'}, 8: {"Update", 'U'}, 9: {"Cancel", 'X'} 10: }; 11: 12: // the only public interface into the dictionary is for reading, so inherently thread-safe 13: public static char? Translate(string orderType) 14: { 15: char charValue; 16: if (_translator.TryGetValue(orderType, out charValue)) 17: { 18: return charValue; 19: } 20: 21: return null; 22: } 23: } Unfortunately, most of our computer science problems cannot get by with just single-threaded applications or with multi-threading in a load-once manner. Looking at today's trends, it's clear to see that computers are not so much getting faster because of faster processor speeds -- we've nearly reached the limits we can push through with today's technologies -- but more because we're adding more cores to the boxes. With this new hardware paradigm, it is even more important to use multi-threaded applications to take full advantage of parallel processing to achieve higher application speeds. So let's look at how to use collections in a thread-safe manner. Using historical collections in a concurrent fashion The early .NET collections (System.Collections) had a Synchronized() static method that could be used to wrap the early collections to make them completely thread-safe. This paradigm was dropped in the generic collections (System.Collections.Generic) because having a synchronized wrapper resulted in atomic locks for all operations, which could prove overkill in many multithreading situations. Thus the paradigm shifted to having the user of the collection specify their own locking, usually with an external object: 1: public class OrderAggregator 2: { 3: private static readonly Dictionary<string, List<Order>> _orders = new Dictionary<string, List<Order>>(); 4: private static readonly _orderLock = new object(); 5: 6: public void Add(string accountNumber, Order newOrder) 7: { 8: List<Order> ordersForAccount; 9: 10: // a complex operation like this should all be protected 11: lock (_orderLock) 12: { 13: if (!_orders.TryGetValue(accountNumber, out ordersForAccount)) 14: { 15: _orders.Add(accountNumber, ordersForAccount = new List<Order>()); 16: } 17: 18: ordersForAccount.Add(newOrder); 19: } 20: } 21: } Notice how we’re performing several operations on the dictionary under one lock. With the Synchronized() static methods of the early collections, you wouldn’t be able to specify this level of locking (a more macro-level). So in the generic collections, it was decided that if a user needed synchronization, they could implement their own locking scheme instead so that they could provide synchronization as needed. The need for better concurrent access to collections Here’s the problem: it’s relatively easy to write a collection that locks itself down completely for access, but anything more complex than that can be difficult and error-prone to write, and much less to make it perform efficiently! For example, what if you have a Dictionary that has frequent reads but in-frequent updates? Do you want to lock down the entire Dictionary for every access? This would be overkill and would prevent concurrent reads. In such cases you could use something like a ReaderWriterLockSlim which allows for multiple readers in a lock, and then once a writer grabs the lock it blocks all further readers until the writer is done (in a nutshell). This is all very complex stuff to consider. Fortunately, this is where the Concurrent Collections come in. The Parallel Computing Platform team at Microsoft went through great pains to determine how to make a set of concurrent collections that would have the best performance characteristics for general case multi-threaded use. Now, as in all things involving threading, you should always make sure you evaluate all your container options based on the particular usage scenario and the degree of parallelism you wish to acheive. This article should not be taken to understand that these collections are always supperior to the generic collections. Each fills a particular need for a particular situation. Understanding what each container is optimized for is key to the success of your application whether it be single-threaded or multi-threaded. General points to consider with the concurrent collections The MSDN points out that the concurrent collections all support the ICollection interface. However, since the collections are already synchronized, the IsSynchronized property always returns false, and SyncRoot always returns null. Thus you should not attempt to use these properties for synchronization purposes. Note that since the concurrent collections also may have different operations than the traditional data structures you may be used to. Now you may ask why they did this, but it was done out of necessity to keep operations safe and atomic. For example, in order to do a Pop() on a stack you have to know the stack is non-empty, but between the time you check the stack’s IsEmpty property and then do the Pop() another thread may have come in and made the stack empty! This is why some of the traditional operations have been changed to make them safe for concurrent use. In addition, some properties and methods in the concurrent collections achieve concurrency by creating a snapshot of the collection, which means that some operations that were traditionally O(1) may now be O(n) in the concurrent models. I’ll try to point these out as we talk about each collection so you can be aware of any potential performance impacts. Finally, all the concurrent containers are safe for enumeration even while being modified, but some of the containers support this in different ways (snapshot vs. dirty iteration). Once again I’ll highlight how thread-safe enumeration works for each collection. ConcurrentStack<T>: The thread-safe LIFO container The ConcurrentStack<T> is the thread-safe counterpart to the System.Collections.Generic.Stack<T>, which as you may remember is your standard last-in-first-out container. If you think of algorithms that favor stack usage (for example, depth-first searches of graphs and trees) then you can see how using a thread-safe stack would be of benefit. The ConcurrentStack<T> achieves thread-safe access by using System.Threading.Interlocked operations. This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast! For the most part, the ConcurrentStack<T> behaves like it’s Stack<T> counterpart with a few differences: Pop() was removed in favor of TryPop() Returns true if an item existed and was popped and false if empty. PushRange() and TryPopRange() were added Allows you to push multiple items and pop multiple items atomically. Count takes a snapshot of the stack and then counts the items. This means it is a O(n) operation, if you just want to check for an empty stack, call IsEmpty instead which is O(1). ToArray() and GetEnumerator() both also take snapshots. This means that iteration over a stack will give you a static view at the time of the call and will not reflect updates. Pushing on a ConcurrentStack<T> works just like you’d expect except for the aforementioned PushRange() method that was added to allow you to push a range of items concurrently. 1: var stack = new ConcurrentStack<string>(); 2: 3: // adding to stack is much the same as before 4: stack.Push("First"); 5: 6: // but you can also push multiple items in one atomic operation (no interleaves) 7: stack.PushRange(new [] { "Second", "Third", "Fourth" }); For looking at the top item of the stack (without removing it) the Peek() method has been removed in favor of a TryPeek(). This is because in order to do a peek the stack must be non-empty, but between the time you check for empty and the time you execute the peek the stack contents may have changed. Thus the TryPeek() was created to be an atomic check for empty, and then peek if not empty: 1: // to look at top item of stack without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (stack.TryPeek(out item)) 5: { 6: Console.WriteLine("Top item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Stack was empty."); 11: } Finally, to remove items from the stack, we have the TryPop() for single, and TryPopRange() for multiple items. Just like the TryPeek(), these operations replace Pop() since we need to ensure atomically that the stack is non-empty before we pop from it: 1: // to remove items, use TryPop or TryPopRange to get multiple items atomically (no interleaves) 2: if (stack.TryPop(out item)) 3: { 4: Console.WriteLine("Popped " + item); 5: } 6: 7: // TryPopRange will only pop up to the number of spaces in the array, the actual number popped is returned. 8: var poppedItems = new string[2]; 9: int numPopped = stack.TryPopRange(poppedItems); 10: 11: foreach (var theItem in poppedItems.Take(numPopped)) 12: { 13: Console.WriteLine("Popped " + theItem); 14: } Finally, note that as stated before, GetEnumerator() and ToArray() gets a snapshot of the data at the time of the call. That means if you are enumerating the stack you will get a snapshot of the stack at the time of the call. This is illustrated below: 1: var stack = new ConcurrentStack<string>(); 2: 3: // adding to stack is much the same as before 4: stack.Push("First"); 5: 6: var results = stack.GetEnumerator(); 7: 8: // but you can also push multiple items in one atomic operation (no interleaves) 9: stack.PushRange(new [] { "Second", "Third", "Fourth" }); 10: 11: while(results.MoveNext()) 12: { 13: Console.WriteLine("Stack only has: " + results.Current); 14: } The only item that will be printed out in the above code is "First" because the snapshot was taken before the other items were added. This may sound like an issue, but it’s really for safety and is more correct. You don’t want to enumerate a stack and have half a view of the stack before an update and half a view of the stack after an update, after all. In addition, note that this is still thread-safe, whereas iterating through a non-concurrent collection while updating it in the old collections would cause an exception. ConcurrentQueue<T>: The thread-safe FIFO container The ConcurrentQueue<T> is the thread-safe counterpart of the System.Collections.Generic.Queue<T> class. The concurrent queue uses an underlying list of small arrays and lock-free System.Threading.Interlocked operations on the head and tail arrays. Once again, this allows us to do thread-safe operations without the need for heavy locks! The ConcurrentQueue<T> (like the ConcurrentStack<T>) has some departures from the non-concurrent counterpart. Most notably: Dequeue() was removed in favor of TryDequeue(). Returns true if an item existed and was dequeued and false if empty. Count does not take a snapshot It subtracts the head and tail index to get the count. This results overall in a O(1) complexity which is quite good. It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero. ToArray() and GetEnumerator() both take snapshots. This means that iteration over a queue will give you a static view at the time of the call and will not reflect updates. The Enqueue() method on the ConcurrentQueue<T> works much the same as the generic Queue<T>: 1: var queue = new ConcurrentQueue<string>(); 2: 3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5: queue.Enqueue("Second"); 6: queue.Enqueue("Third"); For front item access, the TryPeek() method must be used to attempt to see the first item if the queue. There is no Peek() method since, as you’ll remember, we can only peek on a non-empty queue, so we must have an atomic TryPeek() that checks for empty and then returns the first item if the queue is non-empty. 1: // to look at first item in queue without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (queue.TryPeek(out item)) 5: { 6: Console.WriteLine("First item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Queue was empty."); 11: } Then, to remove items you use TryDequeue(). Once again this is for the same reason we have TryPeek() and not Peek(): 1: // to remove items, use TryDequeue. If queue is empty returns false. 2: if (queue.TryDequeue(out item)) 3: { 4: Console.WriteLine("Dequeued first item " + item); 5: } Just like the concurrent stack, the ConcurrentQueue<T> takes a snapshot when you call ToArray() or GetEnumerator() which means that subsequent updates to the queue will not be seen when you iterate over the results. Thus once again the code below will only show the first item, since the other items were added after the snapshot. 1: var queue = new ConcurrentQueue<string>(); 2: 3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5: 6: var iterator = queue.GetEnumerator(); 7: 8: queue.Enqueue("Second"); 9: queue.Enqueue("Third"); 10: 11: // only shows First 12: while (iterator.MoveNext()) 13: { 14: Console.WriteLine("Dequeued item " + iterator.Current); 15: } Using collections concurrently You’ll notice in the examples above I stuck to using single-threaded examples so as to make them deterministic and the results obvious. Of course, if we used these collections in a truly multi-threaded way the results would be less deterministic, but would still be thread-safe and with no locking on your part required! For example, say you have an order processor that takes an IEnumerable<Order> and handles each other in a multi-threaded fashion, then groups the responses together in a concurrent collection for aggregation. This can be done easily with the TPL’s Parallel.ForEach(): 1: public static IEnumerable<OrderResult> ProcessOrders(IEnumerable<Order> orderList) 2: { 3: var proxy = new OrderProxy(); 4: var results = new ConcurrentQueue<OrderResult>(); 5: 6: // notice that we can process all these in parallel and put the results 7: // into our concurrent collection without needing any external locking! 8: Parallel.ForEach(orderList, 9: order => 10: { 11: var result = proxy.PlaceOrder(order); 12: 13: results.Enqueue(result); 14: }); 15: 16: return results; 17: } Summary Obviously, if you do not need multi-threaded safety, you don’t need to use these collections, but when you do need multi-threaded collections these are just the ticket! The plethora of features (I always think of the movie The Three Amigos when I say plethora) built into these containers and the amazing way they acheive thread-safe access in an efficient manner is wonderful to behold. Stay tuned next week where we’ll continue our discussion with the ConcurrentBag<T> and the ConcurrentDictionary<TKey,TValue>. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. Tweet Technorati Tags: C#,.NET,Concurrent Collections,Collections,Multi-Threading,Little Wonders,BlackRabbitCoder,James Michael Hare

Read the article

C#/.NET Little Wonders: The ConcurrentDictionary

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns. Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>. Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!). Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. Recap As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member. While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization. This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work. With .NET 4.0, we get the best of both worlds in generic collections. A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace. These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access. They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms. Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist. As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()). Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentDictionary – the fully thread-safe dictionary The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection. Obviously, both are designed for quick – O(1) – lookups of data based on a key. If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go. Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively. While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search. Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking. Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference. In such cases locking is completely non-productive. However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates. This is where the ConcurrentDictionary really shines! It achieves its thread-safety with no common lock to improve efficiency. It actually uses a series of locks to provide concurrent updates, and has lockless reads! This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have. ConcurrentDictionary and Dictionary differences For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences. Some notable examples of which are: Add() does not exist in the concurrent dictionary. This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd(). It also means that you can’t use a collection initializer with the concurrent dictionary. TryAdd() replaced Add() to attempt atomic, safe adds. Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock. TryUpdate() was added to attempt atomic, safe updates. If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be. If all these are true, we can update the item under one atomic step. TryRemove() was added to attempt atomic, safe removes. To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock. AddOrUpdate() was added to attempt an thread-safe “upsert”. There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does. This allows you to make a thread-safe add-or-update. GetOrAdd() was added to attempt an thread-safe query/insert. Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it. This allows you to get the value if it exists and insert if not. Count, Keys, Values properties take a snapshot of the dictionary. Accessing these properties may interfere with add and update performance and should be used with caution. ToArray() returns a static snapshot of the dictionary. That is, the dictionary is locked, and then copied to an array as a O(n) operation. GetEnumerator() is thread-safe and efficient, but allows dirty reads. Because reads require no locking, you can safely iterate over the contents of the dictionary. The only downside is that, depending on timing, you may get dirty reads. Dirty reads during iteration The last point on GetEnumerator() bears some explanation. Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated. This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly. The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update. Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them. Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this: 1: // load up a dictionary. 2: var dictionary = new ConcurrentDictionary<string, int>(); 3: 4: dictionary["A"] = 1; 5: dictionary["B"] = 2; 6: dictionary["C"] = 3; 7: dictionary["D"] = 4; 8: dictionary["E"] = 5; 9: dictionary["F"] = 6; Then you have one task (using the wonderful TPL!) to iterate using dirty reads: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); And one task to attempt updates in a separate thread (probably): 1: // attempt updates in a separate thread 2: var updateTask = new Task(() => 3: { 4: // iterates, and updates the value by one 5: foreach (var pair in dictionary) 6: { 7: dictionary[pair.Key] = pair.Value + 1; 8: } 9: }); Now that we’ve done this, we can fire up both tasks and wait for them to complete: 1: // start both tasks 2: updateTask.Start(); 3: iterationTask.Start(); 4: 5: // wait for both to complete. 6: Task.WaitAll(updateTask, iterationTask); Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6). However, because the reads are dirty, we will quite possibly get a combination of some updated, some original. My own run netted this result: 1: F:6 2: E:6 3: D:5 4: C:4 5: B:3 6: A:2 Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered. Also note that both E and F show the value 6. This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively). If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary.ToArray()) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); The atomic Try…() methods As you can imagine TryAdd() and TryRemove() have few surprises. Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary: 1: // try add attempts an add and returns false if it already exists 2: if (dictionary.TryAdd("G", 7)) 3: Console.WriteLine("G did not exist, now inserted with 7"); 4: else 5: Console.WriteLine("G already existed, insert failed."); TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key: 1: // attempt to remove the value, if it exists it is removed and the original is returned 2: int removedValue; 3: if (dictionary.TryRemove("C", out removedValue)) 4: Console.WriteLine("Removed C and its value was " + removedValue); 5: else 6: Console.WriteLine("C did not exist, remove failed."); Now TryUpdate() is an interesting creature. You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false. Well, note quite... It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update. If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true. If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned. 1: // attempt to update the value, if it exists and if it has the expected original value 2: if (dictionary.TryUpdate("G", 42, 7)) 3: Console.WriteLine("G existed and was 7, now it's 42."); 4: else 5: Console.WriteLine("G either didn't exist, or wasn't 7."); The composite Add methods The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get. The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does. For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed: 1: public sealed class SubscriptionManager 2: { 3: private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public void AddSubscription(string tickerKey) 7: { 8: // add a new subscription with count of 1, or update existing count by 1 if exists 9: var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1); 10: 11: // now check the result to see if we just incremented the count, or inserted first count 12: if (resultCount == 1) 13: { 14: // subscribe to symbol... 15: } 16: } 17: } Notice the update value factory Func delegate. If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary. The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated). Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value. This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time: 1: public sealed class PriceCache 2: { 3: private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public double QueryPrice(string tickerKey) 7: { 8: // check for the price in the cache, if it doesn't exist it will call the delegate to create value. 9: return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol)); 10: } 11: 12: private double GetCurrentPrice(string tickerKey) 13: { 14: // do code to calculate actual true price. 15: } 16: } There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same. Oddities with the composite Add methods The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic. It is important to note that the methods that use delegates execute those delegates outside of the lock. This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads. This is not necessarily an issue, per se, but it is something you must consider in your design. The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned! Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate. Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary. Now A is done and goes to get the lock, and now sees that the item now exists. In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it. Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int. And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc). So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked): 1: public static class Program 2: { 3: private static int _nextNumber = 0; 4: 5: // the holder of the char to ordinal 6: private static ConcurrentDictionary<char, int> _dictionary 7: = new ConcurrentDictionary<char, int>(); 8: 9: // get the next id value 10: public static int NextId 11: { 12: get { return Interlocked.Increment(ref _nextNumber); } 13: } Then, we add a method that will perform our insert: 1: public static void Inserter() 2: { 3: for (int i = 0; i < 26; i++) 4: { 5: _dictionary.GetOrAdd((char)('A' + i), key => NextId); 6: } 7: } Finally, we run our test by starting two tasks to do this work and get the results… 1: public static void Main() 2: { 3: // 3 tasks attempting to get/insert 4: var tasks = new List<Task> 5: { 6: new Task(Inserter), 7: new Task(Inserter) 8: }; 9: 10: tasks.ForEach(t => t.Start()); 11: Task.WaitAll(tasks.ToArray()); 12: 13: foreach (var pair in _dictionary.OrderBy(p => p.Key)) 14: { 15: Console.WriteLine(pair.Key + ":" + pair.Value); 16: } 17: } If you run this with only one task, you get the expected A:1, B:2, ..., Z:26. But running this in parallel you will get something a bit more complex. My run netted these results: 1: A:1 2: B:3 3: C:4 4: D:5 5: E:6 6: F:7 7: G:8 8: H:9 9: I:10 10: J:11 11: K:12 12: L:13 13: M:14 14: N:15 15: O:16 16: P:17 17: Q:18 18: R:19 19: S:20 20: T:21 21: U:22 22: V:23 23: W:24 24: X:25 25: Y:26 26: Z:27 Notice that B is 3? This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3. However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded. This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time. As such, it is probably a good idea to keep those generators as stateless as possible. Summary The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection. It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently. Tweet Technorati Tags: C#, .NET, Concurrent Collections, Collections, Little Wonders, Black Rabbit Coder,James Michael Hare

Developer IT