unsorted - Page 4 - Developer IT

How does java.util.Collections.contains() perform faster than a linear search?

- by The111

I've been fooling around with a bunch of different ways of searching collections, collections of collections, etc. Doing lots of stupid little tests to verify my understanding. Here is one which boggles me (source code further below). In short, I am generating N random integers and adding them to a list. The list is NOT sorted. I then use Collections.contains() to look for a value in the list. I intentionally look for a value that I know won't be there, because I want to ensure that the entire list space is probed. I time this search. I then do another linear search manually, iterating through each element of the list and checking if it matches my target. I also time this search. On average, the second search takes 33% longer than the first one. By my logic, the first search must also be linear, because the list is unsorted. The only possibility I could think of (which I immediately discard) is that Java is making a sorted copy of my list just for the search, but (1) I did not authorize that usage of memory space and (2) I would think that would result in MUCH more significant time savings with such a large N. So if both searches are linear, they should both take the same amount of time. Somehow the Collections class has optimized this search, but I can't figure out how. So... what am I missing? import java.util.*; public class ListSearch { public static void main(String[] args) { int N = 10000000; // number of ints to add to the list int high = 100; // upper limit for random int generation List<Integer> ints; int target = -1; // target will not be found, forces search of entire list space long start; long end; ints = new ArrayList<Integer>(); start = System.currentTimeMillis(); System.out.print("Generating new list... "); for (int i = 0; i < N; i++) { ints.add(((int) (Math.random() * high)) + 1); } end = System.currentTimeMillis(); System.out.println("took " + (end-start) + "ms."); start = System.currentTimeMillis(); System.out.print("Searching list for target (method 1)... "); if (ints.contains(target)) { // nothing } end = System.currentTimeMillis(); System.out.println(" Took " + (end-start) + "ms."); System.out.println(); ints = new ArrayList<Integer>(); start = System.currentTimeMillis(); System.out.print("Generating new list... "); for (int i = 0; i < N; i++) { ints.add(((int) (Math.random() * high)) + 1); } end = System.currentTimeMillis(); System.out.println("took " + (end-start) + "ms."); start = System.currentTimeMillis(); System.out.print("Searching list for target (method 2)... "); for (Integer i : ints) { // nothing } end = System.currentTimeMillis(); System.out.println(" Took " + (end-start) + "ms."); } }

Read the article

C#/.NET Little Wonders: The Useful But Overlooked Sets

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>. Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items. In other words, the set {2,3,5} is identical to the set {3,5,2}. In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2). In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both. Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set. Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets. Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation? The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order. Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for. Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list. If it is, great, we already have that market data feed! If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it. This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6: 7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4: 5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13: 14: // place the order logic! 15: } What’s wrong with this? In short: performance! Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion. Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long. Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection. Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage. So performing a sort after each add would probably add more time. As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant. The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface. As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>. That is, a Dictionary where the TKey and TValue refer to the same object. This is really an oversimplification, but logically it makes sense. I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups! Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n) Now, these times are amortized and represent the typical case. In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains! This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection. Compare this to the List where almost any add or remove must rearrange potentially all the elements! And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T> is change our declaration. Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4: 5: // ... 6: 7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22: 23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>. Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order. Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally. Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes. There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>. Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n) The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error. There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity. Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection. Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection. This is very good performance! It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000 Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count. This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search). So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets. Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets. Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets. Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets. You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order. That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it. This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection. Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value. In addition, if you have need for the mathematical set operations, the C# sets support those as well. The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order. In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance. Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

Read the article

CodePlex Daily Summary for Monday, November 07, 2011

CodePlex Daily Summary for Monday, November 07, 2011Popular ReleasesGoogleMap Control: GoogleMap Control 6.0: Major design changes to the control in order to achieve better scalability and extensibility for the new features comming with GoogleMaps API. GoogleMap control switched to GoogleMaps API v3 and .NET 4.0. GoogleMap control is 100% ScriptControl now, it requires ScriptManager to be registered on the pages where and before it is used. Markers, polylines, polygons and directions were implemented as ExtenderControl, instead of being inner properties of GoogleMap control. Better perfomance. Better...WDTVHubGen - Adds Metadata, thumbnails and subtitles to WDTV Live Hubs: V2.1: Version 2.1 (click on the right) this uses V4.0 of .net Version 2.1 adds the following features: (apologize if I forget some, added a lot of little things) Manual Lookup with TV or Movie (finally huh!), you can look up a movie or TV episode directly, you can right click on anythign, and choose manual lookup, then will allow you to type anything you want to look up and it will assign it to the file you right clicked. No Rename: a very popular request, this is an option you can set so that t...Bulk Copy Test Cases Tool for Microsoft Test Manager & TFS: Bulk Copy Test Cases Tool: A while ago I had written a blog post Microsoft Test Manager Test Case Versioning on how to manage Test Cases over multiple releases which required you to manually copy test cases individually. Now there is a tool to help with the bulk copying of Test Cases that updates the Iteration field at the same time.Self-Tracking Entity Generator for WPF and Silverlight: Self-Tracking Entity Generator v 0.9.9 Update 2: Self-Tracking Entity Generator v 0.9.9 for Entity Framework 4.0. No change to the self-tracking entity generator v 0.9.9. WPF sample (SchoolSample) is updated with unit testing for both ViewModel and Model classes.SubExtractor: Release 1020: Feature: added "baseline double quotes" character to selector box Feature: added option to save SRT files as ANSI (instead of previous UTF-8 only) Feature: made "Save Sup files to Source directory" apply to both Sup and Idx source files. Fix: removed SDH text (...) or [...] that is split over 2 lines Fix: better decision-making in when to prefix a line with a '-' because SDH was removedAcDown????? - Anime&Comic Downloader: AcDown????? v3.6.1: ?? ● AcDown??????????、??????，??????????????????????，???????Acfun、Bilibili、???、???、???、Tucao.cc、SF???、?????80????，???????????、?????????。 ● AcDown???????????????????????????，???，???????????????????。 ● AcDown???????C#??，????.NET Framework 2.0??。?????"Acfun?????"。 ????32??64? Windows XP/Vista/7 ????????????? ??：????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64)，?????"?????????"??? ??????????????，??????????： ??"AcDown?????"????????? ?? v3.6.1?? ??.hlv...Track Folder Changes: Track Folder Changes 1.1: Fixed exception when right-clicking the root nodeKinect Toolbox: Kinect Toolbox v1.1.0.2: This version adds support for the Kinect for Windows SDK beta 2.MapWindow 4: MapWindow GIS v4.8.6 - Final release - 32Bit: This is the final release of MapWindow v4.8. It has 4.8.6 as version number. This version has been thoroughly tested. If you do get an exception send the exception to us. Don't forget to include your e-mail address. Use the forums at http://www.mapwindow.org/phorum/ for questions. Please consider donating a small portion of the money you have saved by having free GIS tools: http://www.mapwindow.org/pages/donate.php What’s New in 4.8.6 (Final release) · A few minor issues have been fixed Wha...Kinect Mouse Cursor: Kinect Mouse Cursor 1.1: Updated for Kinect for Windows SDK v1.0 Beta 2!Coding4Fun Kinect Toolkit: Coding4Fun Kinect Toolkit 1.1: Updated for Kinect for Windows SDK v1.0 Beta 2!Async Executor: 1.0: Source code of the AsyncExecutorMedia Companion: MC 3.421b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) TV Show Resolutions... Fix to show the season-specials.tbn when selecting an episode from season 00. Before, MC would try & load season00.tbn Fix for issue #197 - new show added by 'Manually Add Path' not being picked up. Also made non-visible the same thing in Root Folders...Nearforums - ASP.NET MVC forum engine: Nearforums v7.0: Version 7.0 of Nearforums, the ASP.NET MVC Forum Engine, containing new features: UI: Flexible layout to handle both list and table-like template layouts. Theming - Visual choice of themes: Deliver some templates on installation, export/import functionality, preview. Allow site owners to choose default list sort order for the forums. Forum latest activity. Visit the project Roadmap for more details. Webdeploy packages sha1 checksum: e6bb913e591543ab292a753d1a16cdb779488c10?????????? - ????????: All-In-One Code Framework ??? 2011-11-02: http://download.codeplex.com/Project/Download/FileDownload.aspx?ProjectName=1codechs&DownloadId=216140 ??????，11??，?????20????Microsoft OneCode Sample，????6?Program Language Sample，2?Windows Base Sample，2?GDI+ Sample，4?Internet Explorer Sample?6?ASP.NET Sample。?????????????。 ????，?????。http://i3.codeplex.com/Project/Download/FileDownload.aspx?ProjectName=1code&DownloadId=128165 Program Language CSImageFullScreenSlideShow VBImageFullScreenSlideShow CSDynamicallyBuildLambdaExpressionWithFie...Python Tools for Visual Studio: 1.1 Alpha: We’re pleased to announce the release of Python Tools for Visual Studio 1.1 Alpha. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python programming language. This release includes new core IDE features, a couple of new sample libraries for interacting with Kinect and Excel, and many bug fixes for issues reported since the release of 1.0. For the core IDE features we’ve added many new features which improve the basic edit...BExplorer (Better Explorer): Better Explorer 2.0.0.631 Alpha: Changelog: Added: Some new functions in ribbon Added: Possibility to choose displayed columns Added: Basic Search Fixed: Some bugs after navigation Fixed: Attempt to fix slow navigation and slow start Known issues: - BreadcrumbBar fails on some situations - Basic search not work quite well in some situations Please if anyone find bugs be kind and report them at the Issue Tracker! Thanks!DotNetNuke® Community Edition: 05.06.04: Major Highlights Fixed issue with upgrades on systems that had upgraded the Telerik library to 6.0.0 Fixed issue with Razor Host upgrade to 5.6.3 The logic for module administration checks contains incorrect logic in 1 place, opening the possibility of a user with edit permissions gaining access to functionality they should not have through a particularly crafted url Security FixesBrowsers support the ability to remember common strings such as usernames/addresses etc. Code was adde...Terminals: Version 2.0 - Beta 3 Release: Beta 3 Refresh Dont forget to backup your config files BEFORE upgrading! The team has finally put the nail into the official release date for version 2.0. As bugs are winding down on the 2.0 Roadmap we decided to push out another build - the first 2.0 Beta build. Please take time to use and abuse this release. We left logging in place, and this is a debug build so be sure to submit your logs on each bug reported, and please do report all bugs! Check the source code page on the site, th...iTuner - The iTunes Companion: iTuner 1.4.4322: Added German (unverified, apologies if incorrect) Properly source invariant resources with correct resIDs Replaced obsolete lyric providers with working providers Fix Pseudolater to correctly morph every third char Fix null reference in CatalogBaseNew ProjectsA Blog: This is a blog plus personal web page frameworkAccess 1-D Intersection: This is an Access VBA Module containing functions that allow make it easy to determine overlaps in 1-D intervals. For instance if table A contains a range of 0-7 and Table B contains a range of 5-10, the intersection is 5-7.AkismetPC: A C# implementation of the popular anti-spam plugin Akismet. There aren't many .NET versions of Akismet so I decided to write one and that can be used with .NET blog engines such as Subtext, etc.AlertMonkey: A multicast chat client that enables users to send html, images, sounds, and files to connected users. Provides specialized alert types such as lunch and happy hour, as well as channel support.Azzeton: azzetonBKWork: private project.Blue: Blue is a web application for italian baseball and softball umpires.Build Javascript Models from .Net Classes: Build JavaScript Data Models from .Net Classes automaticallycmpp: cmppCRM 2011 TreeView for Dependent Picklist: This utility will allow CRM Customizer to configure Dependent Picklist items which will be shown as TreeView control on CRM form.DirSign: DirSign is a console exe that evaluates or checks directory signature. DirSign is used to check if something in a directory tree has changed (a file date or a file size or a new or missing file). You can use DirSign in scenario where you need to check if something changed since last time but where you can't install a file system watcher.epictactics: Game for WP7Export SharePoint 2010 External List to Excel: Export SharePoint 2010 external list to Excel with custom ribbon plugin. Export current external list with selected view to office 97 - 2003 or office 2007 - 2010.Floridum: Project for a XML Database.GNU ISO8583: GISO (GNU ISO) is a tool that makes it easier to analyze ISO 8583 financial transactions and also provides a platform to create a host simulator, capable of receiving requests and sending back the responses. It’s a WinForms application and it’s developed using C#.G's Syndication Pocket: G's Syndication Pocket is simple RSS Aggregate application. This is suitable for .NET Compact Framework. I checked it on Sharp's W-ZERO3.Hatena Netfx Library: .NET Library for Hatena Services.inohigo: a programming language that was developed by inohiro.Internet Cache Examiner: Internet Cache Examiner allows Internet Explorer INDEX.DAT files to be read directly, allowing the extraction of more information than is displayed in Internet Explorer, and without being limited to viewing only the activity of the current user. It's developed in C#.Javascript to IQueryable: javascript to IQueryable is an implementation that allows to write a simple query in javascript and then execute it on the server with EntityFramework or a linq provider that implement IQueryable.kisd: Just my code, wanted to keep it safe.LUCA UI for Silverlight 4: LUCA UI is a collection of flexible layout controls for Silverlight 4. Basically, using these controls you can create the same type of user-definable UI that Visual Studio and Expression Blend have.Messenger Game - Starter Kit: Kom godt i gang med at lave spil til Messenger med dette komplette Starter Kit. Indeholder et komplet netværksspil lavet med Messenger Activity API og Silverlight.Music Keys: Music KeysMyNote: MyNoteOpen Source Data System: DataSystem is a file based database system that is thread safe. It is a dynamically generated database meaning developers can either structure it outside the application prior or development. PhotoDesktop: Create background images for your desktop using hundreds of your photos off your local computer. (coming soon - use flickr [or other RSS] feeds)SharePoint Backup Augmentation Cmdlets: The SharePoint Backup Augmentation Cmdlets (SharePointBAC) provide administrators with additional PowerShell cmdlets to complement and extend SharePoint 2010's native backup and restore capabilities. SharePointBAC makes it possible to groom backup sets, archive backups, and more.SharpClassifier: C "Classifier" is an AI software component that tries to classify instances from given evidence (if shiny then diamond). A famous example is classifying email spam, separating it from ham. SharpClassifier currently only contains a single classifier - A Bayesian Naive Classifier. Most Bayesian Naive Classifiers for C# you'll find out there only handles two classes (spam/ham), but this implementation supports any number of classses.Shell Sort Web service and Application: this is a webservice of Sorting methode. use Shell sort methode to sorthing a unsorted number, and it can give a boundary as you input this project is made by Information System students, Ma Chung University , Malang - East Java - Indonesia [url:www.Machung.ac.id] Anna Letizia & SetiawanEka Prayuda Barbiezztissa@gmail.com & setya_09@hotmail.comSistema UELS: adsfasdfSorting Number use Insertion Sort on Web Service: This program can simulate the insertion sort easily.TA_Sorted_App01: First implementation of TA_Sorted Algorithm ThinkDeeper MVC framework: ThinkDeeper MVC is a WPF MVC for .NET 3.5. Typing Game: The Nottingham Game Developer's first game.xBlog: xBlog is a project to build a simple and extensible Blog Engine based on xml and linqXNA DebugDrawer Using Spritebatch: This project serves to show how to draw lines and rectangles using XNA's Spritebatch. This project uses XNA 4.0 and C# programming languageYet another Scedule Planner: YASP - Yet another Scedule Planner

Read the article

C#/.NET Little Wonders: The ConcurrentDictionary

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns. Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>. Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!). Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. Recap As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member. While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization. This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work. With .NET 4.0, we get the best of both worlds in generic collections. A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace. These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access. They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms. Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist. As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()). Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentDictionary – the fully thread-safe dictionary The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection. Obviously, both are designed for quick – O(1) – lookups of data based on a key. If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go. Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively. While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search. Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking. Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference. In such cases locking is completely non-productive. However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates. This is where the ConcurrentDictionary really shines! It achieves its thread-safety with no common lock to improve efficiency. It actually uses a series of locks to provide concurrent updates, and has lockless reads! This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have. ConcurrentDictionary and Dictionary differences For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences. Some notable examples of which are: Add() does not exist in the concurrent dictionary. This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd(). It also means that you can’t use a collection initializer with the concurrent dictionary. TryAdd() replaced Add() to attempt atomic, safe adds. Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock. TryUpdate() was added to attempt atomic, safe updates. If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be. If all these are true, we can update the item under one atomic step. TryRemove() was added to attempt atomic, safe removes. To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock. AddOrUpdate() was added to attempt an thread-safe “upsert”. There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does. This allows you to make a thread-safe add-or-update. GetOrAdd() was added to attempt an thread-safe query/insert. Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it. This allows you to get the value if it exists and insert if not. Count, Keys, Values properties take a snapshot of the dictionary. Accessing these properties may interfere with add and update performance and should be used with caution. ToArray() returns a static snapshot of the dictionary. That is, the dictionary is locked, and then copied to an array as a O(n) operation. GetEnumerator() is thread-safe and efficient, but allows dirty reads. Because reads require no locking, you can safely iterate over the contents of the dictionary. The only downside is that, depending on timing, you may get dirty reads. Dirty reads during iteration The last point on GetEnumerator() bears some explanation. Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated. This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly. The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update. Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them. Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this: 1: // load up a dictionary. 2: var dictionary = new ConcurrentDictionary<string, int>(); 3: 4: dictionary["A"] = 1; 5: dictionary["B"] = 2; 6: dictionary["C"] = 3; 7: dictionary["D"] = 4; 8: dictionary["E"] = 5; 9: dictionary["F"] = 6; Then you have one task (using the wonderful TPL!) to iterate using dirty reads: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); And one task to attempt updates in a separate thread (probably): 1: // attempt updates in a separate thread 2: var updateTask = new Task(() => 3: { 4: // iterates, and updates the value by one 5: foreach (var pair in dictionary) 6: { 7: dictionary[pair.Key] = pair.Value + 1; 8: } 9: }); Now that we’ve done this, we can fire up both tasks and wait for them to complete: 1: // start both tasks 2: updateTask.Start(); 3: iterationTask.Start(); 4: 5: // wait for both to complete. 6: Task.WaitAll(updateTask, iterationTask); Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6). However, because the reads are dirty, we will quite possibly get a combination of some updated, some original. My own run netted this result: 1: F:6 2: E:6 3: D:5 4: C:4 5: B:3 6: A:2 Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered. Also note that both E and F show the value 6. This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively). If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary.ToArray()) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); The atomic Try…() methods As you can imagine TryAdd() and TryRemove() have few surprises. Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary: 1: // try add attempts an add and returns false if it already exists 2: if (dictionary.TryAdd("G", 7)) 3: Console.WriteLine("G did not exist, now inserted with 7"); 4: else 5: Console.WriteLine("G already existed, insert failed."); TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key: 1: // attempt to remove the value, if it exists it is removed and the original is returned 2: int removedValue; 3: if (dictionary.TryRemove("C", out removedValue)) 4: Console.WriteLine("Removed C and its value was " + removedValue); 5: else 6: Console.WriteLine("C did not exist, remove failed."); Now TryUpdate() is an interesting creature. You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false. Well, note quite... It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update. If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true. If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned. 1: // attempt to update the value, if it exists and if it has the expected original value 2: if (dictionary.TryUpdate("G", 42, 7)) 3: Console.WriteLine("G existed and was 7, now it's 42."); 4: else 5: Console.WriteLine("G either didn't exist, or wasn't 7."); The composite Add methods The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get. The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does. For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed: 1: public sealed class SubscriptionManager 2: { 3: private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public void AddSubscription(string tickerKey) 7: { 8: // add a new subscription with count of 1, or update existing count by 1 if exists 9: var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1); 10: 11: // now check the result to see if we just incremented the count, or inserted first count 12: if (resultCount == 1) 13: { 14: // subscribe to symbol... 15: } 16: } 17: } Notice the update value factory Func delegate. If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary. The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated). Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value. This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time: 1: public sealed class PriceCache 2: { 3: private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>(); 4: 5: // adds a new subscription, or increments the count of the existing one. 6: public double QueryPrice(string tickerKey) 7: { 8: // check for the price in the cache, if it doesn't exist it will call the delegate to create value. 9: return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol)); 10: } 11: 12: private double GetCurrentPrice(string tickerKey) 13: { 14: // do code to calculate actual true price. 15: } 16: } There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same. Oddities with the composite Add methods The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic. It is important to note that the methods that use delegates execute those delegates outside of the lock. This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads. This is not necessarily an issue, per se, but it is something you must consider in your design. The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned! Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate. Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary. Now A is done and goes to get the lock, and now sees that the item now exists. In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it. Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int. And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc). So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked): 1: public static class Program 2: { 3: private static int _nextNumber = 0; 4: 5: // the holder of the char to ordinal 6: private static ConcurrentDictionary<char, int> _dictionary 7: = new ConcurrentDictionary<char, int>(); 8: 9: // get the next id value 10: public static int NextId 11: { 12: get { return Interlocked.Increment(ref _nextNumber); } 13: } Then, we add a method that will perform our insert: 1: public static void Inserter() 2: { 3: for (int i = 0; i < 26; i++) 4: { 5: _dictionary.GetOrAdd((char)('A' + i), key => NextId); 6: } 7: } Finally, we run our test by starting two tasks to do this work and get the results… 1: public static void Main() 2: { 3: // 3 tasks attempting to get/insert 4: var tasks = new List<Task> 5: { 6: new Task(Inserter), 7: new Task(Inserter) 8: }; 9: 10: tasks.ForEach(t => t.Start()); 11: Task.WaitAll(tasks.ToArray()); 12: 13: foreach (var pair in _dictionary.OrderBy(p => p.Key)) 14: { 15: Console.WriteLine(pair.Key + ":" + pair.Value); 16: } 17: } If you run this with only one task, you get the expected A:1, B:2, ..., Z:26. But running this in parallel you will get something a bit more complex. My run netted these results: 1: A:1 2: B:3 3: C:4 4: D:5 5: E:6 6: F:7 7: G:8 8: H:9 9: I:10 10: J:11 11: K:12 12: L:13 13: M:14 14: N:15 15: O:16 16: P:17 17: Q:18 18: R:19 19: S:20 20: T:21 21: U:22 22: V:23 23: W:24 24: X:25 25: Y:26 26: Z:27 Notice that B is 3? This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3. However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded. This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time. As such, it is probably a good idea to keep those generators as stateless as possible. Summary The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection. It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently. Tweet Technorati Tags: C#, .NET, Concurrent Collections, Collections, Little Wonders, Black Rabbit Coder,James Michael Hare

Developer IT