Search Results

Search found 377 results on 16 pages for 'bryan hare'.

Page 2/16 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • C#: String Concatenation vs Format vs StringBuilder

    - by James Michael Hare
    I was looking through my groups’ C# coding standards the other day and there were a couple of legacy items in there that caught my eye.  They had been passed down from committee to committee so many times that no one even thought to second guess and try them for a long time.  It’s yet another example of how micro-optimizations can often get the best of us and cause us to write code that is not as maintainable as it could be for the sake of squeezing an extra ounce of performance out of our software. So the two standards in question were these, in paraphrase: Prefer StringBuilder or string.Format() to string concatenation. Prefer string.Equals() with case-insensitive option to string.ToUpper().Equals(). Now some of you may already know what my results are going to show, as these items have been compared before on many blogs, but I think it’s always worth repeating and trying these yourself.  So let’s dig in. The first test was a pretty standard one.  When concattenating strings, what is the best choice: StringBuilder, string concattenation, or string.Format()? So before we being I read in a number of iterations from the console and a length of each string to generate.  Then I generate that many random strings of the given length and an array to hold the results.  Why am I so keen to keep the results?  Because I want to be able to snapshot the memory and don’t want garbage collection to collect the strings, hence the array to keep hold of them.  I also didn’t want the random strings to be part of the allocation, so I pre-allocate them and the array up front before the snapshot.  So in the code snippets below: num – Number of iterations. strings – Array of randomly generated strings. results – Array to hold the results of the concatenation tests. timer – A System.Diagnostics.Stopwatch() instance to time code execution. start – Beginning memory size. stop – Ending memory size. after – Memory size after final GC. So first, let’s look at the concatenation loop: 1: // build num strings using concattenation. 2: for (int i = 0; i < num; i++) 3: { 4: results[i] = "This is test #" + i + " with a result of " + strings[i]; 5: } Pretty standard, right?  Next for string.Format(): 1: // build strings using string.Format() 2: for (int i = 0; i < num; i++) 3: { 4: results[i] = string.Format("This is test #{0} with a result of {1}", i, strings[i]); 5: }   Finally, StringBuilder: 1: // build strings using StringBuilder 2: for (int i = 0; i < num; i++) 3: { 4: var builder = new StringBuilder(); 5: builder.Append("This is test #"); 6: builder.Append(i); 7: builder.Append(" with a result of "); 8: builder.Append(strings[i]); 9: results[i] = builder.ToString(); 10: } So I take each of these loops, and time them by using a block like this: 1: // get the total amount of memory used, true tells it to run GC first. 2: start = System.GC.GetTotalMemory(true); 3:  4: // restart the timer 5: timer.Reset(); 6: timer.Start(); 7:  8: // *** code to time and measure goes here. *** 9:  10: // get the current amount of memory, stop the timer, then get memory after GC. 11: stop = System.GC.GetTotalMemory(false); 12: timer.Stop(); 13: other = System.GC.GetTotalMemory(true); So let’s look at what happens when I run each of these blocks through the timer and memory check at 500,000 iterations: 1: Operator + - Time: 547, Memory: 56104540/55595960 - 500000 2: string.Format() - Time: 749, Memory: 57295812/55595960 - 500000 3: StringBuilder - Time: 608, Memory: 55312888/55595960 – 500000   Egad!  string.Format brings up the rear and + triumphs, well, at least in terms of speed.  The concat burns more memory than StringBuilder but less than string.Format().  This shows two main things: StringBuilder is not always the panacea many think it is. The difference between any of the three is miniscule! The second point is extremely important!  You will often here people who will grasp at results and say, “look, operator + is 10% faster than StringBuilder so always use StringBuilder.”  Statements like this are a disservice and often misleading.  For example, if I had a good guess at what the size of the string would be, I could have preallocated my StringBuffer like so:   1: for (int i = 0; i < num; i++) 2: { 3: // pre-declare StringBuilder to have 100 char buffer. 4: var builder = new StringBuilder(100); 5: builder.Append("This is test #"); 6: builder.Append(i); 7: builder.Append(" with a result of "); 8: builder.Append(strings[i]); 9: results[i] = builder.ToString(); 10: }   Now let’s look at the times: 1: Operator + - Time: 551, Memory: 56104412/55595960 - 500000 2: string.Format() - Time: 753, Memory: 57296484/55595960 - 500000 3: StringBuilder - Time: 525, Memory: 59779156/55595960 - 500000   Whoa!  All of the sudden StringBuilder is back on top again!  But notice, it takes more memory now.  This makes perfect sense if you examine the IL behind the scenes.  Whenever you do a string concat (+) in your code, it examines the lengths of the arguments and creates a StringBuilder behind the scenes of the appropriate size for you. But even IF we know the approximate size of our StringBuilder, look how much less readable it is!  That’s why I feel you should always take into account both readability and performance.  After all, consider all these timings are over 500,000 iterations.   That’s at best  0.0004 ms difference per call which is neglidgable at best.  The key is to pick the best tool for the job.  What do I mean?  Consider these awesome words of wisdom: Concatenate (+) is best at concatenating.  StringBuilder is best when you need to building. Format is best at formatting. Totally Earth-shattering, right!  But if you consider it carefully, it actually has a lot of beauty in it’s simplicity.  Remember, there is no magic bullet.  If one of these always beat the others we’d only have one and not three choices. The fact is, the concattenation operator (+) has been optimized for speed and looks the cleanest for joining together a known set of strings in the simplest manner possible. StringBuilder, on the other hand, excels when you need to build a string of inderterminant length.  Use it in those times when you are looping till you hit a stop condition and building a result and it won’t steer you wrong. String.Format seems to be the looser from the stats, but consider which of these is more readable.  Yes, ignore the fact that you could do this with ToString() on a DateTime.  1: // build a date via concatenation 2: var date1 = (month < 10 ? string.Empty : "0") + month + '/' 3: + (day < 10 ? string.Empty : "0") + '/' + year; 4:  5: // build a date via string builder 6: var builder = new StringBuilder(10); 7: if (month < 10) builder.Append('0'); 8: builder.Append(month); 9: builder.Append('/'); 10: if (day < 10) builder.Append('0'); 11: builder.Append(day); 12: builder.Append('/'); 13: builder.Append(year); 14: var date2 = builder.ToString(); 15:  16: // build a date via string.Format 17: var date3 = string.Format("{0:00}/{1:00}/{2:0000}", month, day, year); 18:  So the strength in string.Format is that it makes constructing a formatted string easy to read.  Yes, it’s slower, but look at how much more elegant it is to do zero-padding and anything else string.Format does. So my lesson is, don’t look for the silver bullet!  Choose the best tool.  Micro-optimization almost always bites you in the end because you’re sacrificing readability for performance, which is almost exactly the wrong choice 90% of the time. I love the rules of optimization.  They’ve been stated before in many forms, but here’s how I always remember them: For Beginners: Do not optimize. For Experts: Do not optimize yet. It’s so true.  Most of the time on today’s modern hardware, a micro-second optimization at the sake of readability will net you nothing because it won’t be your bottleneck.  Code for readability, choose the best tool for the job which will usually be the most readable and maintainable as well.  Then, and only then, if you need that extra performance boost after profiling your code and exhausting all other options… then you can start to think about optimizing.

    Read the article

  • C#/.NET Little Wonders: A Redux

    - by James Michael Hare
    I gave my Little Wonders presentation to the Topeka Dot Net Users' Group today, so re-posting the links to all the previous posts for them. The Presentation: C#/.NET Little Wonders: A Presentation The Original Trilogy: C#/.NET Five Little Wonders (part 1) C#/.NET Five More Little Wonders (part 2) C#/.NET Five Final Little Wonders (part 3) The Subsequent Sequels: C#/.NET Little Wonders: ToDictionary() and ToList() C#/.NET Little Wonders: DateTime is Packed With Goodies C#/.NET Little Wonders: Fun With Enum Methods C#/.NET Little Wonders: Cross-Calling Constructors C#/.NET Little Wonders: Constraining Generics With Where Clause C#/.NET Little Wonders: Comparer<T>.Default C#/.NET Little Wonders: The Useful (But Overlooked) Sets The Concurrent Wonders: C#/.NET Little Wonders: The Concurrent Collections (1 of 3) - ConcurrentQueue and ConcurrentStack C#/.NET Little Wonders: The Concurrent Collections (2 of 3) - ConcurrentDictionary Tweet   Technorati Tags: .NET,C#,Little Wonders

    Read the article

  • C#: Optional Parameters - Pros and Pitfalls

    - by James Michael Hare
    When Microsoft rolled out Visual Studio 2010 with C# 4, I was very excited to learn how I could apply all the new features and enhancements to help make me and my team more productive developers. Default parameters have been around forever in C++, and were intentionally omitted in Java in favor of using overloading to satisfy that need as it was though that having too many default parameters could introduce code safety issues.  To some extent I can understand that move, as I’ve been bitten by default parameter pitfalls before, but at the same time I feel like Java threw out the baby with the bathwater in that move and I’m glad to see C# now has them. This post briefly discusses the pros and pitfalls of using default parameters.  I’m avoiding saying cons, because I really don’t believe using default parameters is a negative thing, I just think there are things you must watch for and guard against to avoid abuses that can cause code safety issues. Pro: Default Parameters Can Simplify Code Let’s start out with positives.  Consider how much cleaner it is to reduce all the overloads in methods or constructors that simply exist to give the semblance of optional parameters.  For example, we could have a Message class defined which allows for all possible initializations of a Message: 1: public class Message 2: { 3: // can either cascade these like this or duplicate the defaults (which can introduce risk) 4: public Message() 5: : this(string.Empty) 6: { 7: } 8:  9: public Message(string text) 10: : this(text, null) 11: { 12: } 13:  14: public Message(string text, IDictionary<string, string> properties) 15: : this(text, properties, -1) 16: { 17: } 18:  19: public Message(string text, IDictionary<string, string> properties, long timeToLive) 20: { 21: // ... 22: } 23: }   Now consider the same code with default parameters: 1: public class Message 2: { 3: // can either cascade these like this or duplicate the defaults (which can introduce risk) 4: public Message(string text = "", IDictionary<string, string> properties = null, long timeToLive = -1) 5: { 6: // ... 7: } 8: }   Much more clean and concise and no repetitive coding!  In addition, in the past if you wanted to be able to cleanly supply timeToLive and accept the default on text and properties above, you would need to either create another overload, or pass in the defaults explicitly.  With named parameters, though, we can do this easily: 1: var msg = new Message(timeToLive: 100);   Pro: Named Parameters can Improve Readability I must say one of my favorite things with the default parameters addition in C# is the named parameters.  It lets code be a lot easier to understand visually with no comments.  Think how many times you’ve run across a TimeSpan declaration with 4 arguments and wondered if they were passing in days/hours/minutes/seconds or hours/minutes/seconds/milliseconds.  A novice running through your code may wonder what it is.  Named arguments can help resolve the visual ambiguity: 1: // is this days/hours/minutes/seconds (no) or hours/minutes/seconds/milliseconds (yes) 2: var ts = new TimeSpan(1, 2, 3, 4); 3:  4: // this however is visually very explicit 5: var ts = new TimeSpan(days: 1, hours: 2, minutes: 3, seconds: 4);   Or think of the times you’ve run across something passing a Boolean literal and wondered what it was: 1: // what is false here? 2: var sub = CreateSubscriber(hostname, port, false); 3:  4: // aha! Much more visibly clear 5: var sub = CreateSubscriber(hostname, port, isBuffered: false);   Pitfall: Don't Insert new Default Parameters In Between Existing Defaults Now let’s consider a two potential pitfalls.  The first is really an abuse.  It’s not really a fault of the default parameters themselves, but a fault in the use of them.  Let’s consider that Message constructor again with defaults.  Let’s say you want to add a messagePriority to the message and you think this is more important than a timeToLive value, so you decide to put messagePriority before it in the default, this gives you: 1: public class Message 2: { 3: public Message(string text = "", IDictionary<string, string> properties = null, int priority = 5, long timeToLive = -1) 4: { 5: // ... 6: } 7: }   Oh boy have we set ourselves up for failure!  Why?  Think of all the code out there that could already be using the library that already specified the timeToLive, such as this possible call: 1: var msg = new Message(“An error occurred”, myProperties, 1000);   Before this specified a message with a TTL of 1000, now it specifies a message with a priority of 1000 and a time to live of -1 (infinite).  All of this with NO compiler errors or warnings. So the rule to take away is if you are adding new default parameters to a method that’s currently in use, make sure you add them to the end of the list or create a brand new method or overload. Pitfall: Beware of Default Parameters in Inheritance and Interface Implementation Now, the second potential pitfalls has to do with inheritance and interface implementation.  I’ll illustrate with a puzzle: 1: public interface ITag 2: { 3: void WriteTag(string tagName = "ITag"); 4: } 5:  6: public class BaseTag : ITag 7: { 8: public virtual void WriteTag(string tagName = "BaseTag") { Console.WriteLine(tagName); } 9: } 10:  11: public class SubTag : BaseTag 12: { 13: public override void WriteTag(string tagName = "SubTag") { Console.WriteLine(tagName); } 14: } 15:  16: public static class Program 17: { 18: public static void Main() 19: { 20: SubTag subTag = new SubTag(); 21: BaseTag subByBaseTag = subTag; 22: ITag subByInterfaceTag = subTag; 23:  24: // what happens here? 25: subTag.WriteTag(); 26: subByBaseTag.WriteTag(); 27: subByInterfaceTag.WriteTag(); 28: } 29: }   What happens?  Well, even though the object in each case is SubTag whose tag is “SubTag”, you will get: 1: SubTag 2: BaseTag 3: ITag   Why?  Because default parameter are resolved at compile time, not runtime!  This means that the default does not belong to the object being called, but by the reference type it’s being called through.  Since the SubTag instance is being called through an ITag reference, it will use the default specified in ITag. So the moral of the story here is to be very careful how you specify defaults in interfaces or inheritance hierarchies.  I would suggest avoiding repeating them, and instead concentrating on the layer of classes or interfaces you must likely expect your caller to be calling from. For example, if you have a messaging factory that returns an IMessage which can be either an MsmqMessage or JmsMessage, it only makes since to put the defaults at the IMessage level since chances are your user will be using the interface only. So let’s sum up.  In general, I really love default and named parameters in C# 4.0.  I think they’re a great tool to help make your code easier to read and maintain when used correctly. On the plus side, default parameters: Reduce redundant overloading for the sake of providing optional calling structures. Improve readability by being able to name an ambiguous argument. But remember to make sure you: Do not insert new default parameters in the middle of an existing set of default parameters, this may cause unpredictable behavior that may not necessarily throw a syntax error – add to end of list or create new method. Be extremely careful how you use default parameters in inheritance hierarchies and interfaces – choose the most appropriate level to add the defaults based on expected usage. Technorati Tags: C#,.NET,Software,Default Parameters

    Read the article

  • C#/.NET Little Wonders: Of LINQ and Lambdas - A Presentation

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Today I’m giving a brief beginner’s guide to LINQ and Lambdas at the St. Louis .NET User’s Group so I thought I’d post the presentation here as well.  I updated the presentation a bit as well as added some notes on the query syntax.  Enjoy! The C#/.NET Fundaments: Of Lambdas and LINQ Presentation Of Lambdas and LINQ View more presentations from BlackRabbitCoder   Technorati Tags: C#, CSharp, .NET, Little Wonders, LINQ, Lambdas

    Read the article

  • C#/.NET Little Wonders: ConcurrentBag and BlockingCollection

    - by James Michael Hare
    In the first week of concurrent collections, began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>.  The last post discussed the ConcurrentDictionary<T> .  Finally this week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see C#/.NET Little Wonders: A Redux. Recap As you'll recall from the previous posts, the original collections were object-based containers that accomplished synchronization through a Synchronized member.  With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization.  With .NET 4.0, a new breed of collections was born in the System.Collections.Concurrent namespace.  Of these, the final concurrent collection we will examine is the ConcurrentBag and a very useful wrapper class called the BlockingCollection. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this informative whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentBag<T> – Thread-safe unordered collection. Unlike the other concurrent collections, the ConcurrentBag<T> has no non-concurrent counterpart in the .NET collections libraries.  Items can be added and removed from a bag just like any other collection, but unlike the other collections, the items are not maintained in any order.  This makes the bag handy for those cases when all you care about is that the data be consumed eventually, without regard for order of consumption or even fairness – that is, it’s possible new items could be consumed before older items given the right circumstances for a period of time. So why would you ever want a container that can be unfair?  Well, to look at it another way, you can use a ConcurrentQueue and get the fairness, but it comes at a cost in that the ordering rules and synchronization required to maintain that ordering can affect scalability a bit.  Thus sometimes the bag is great when you want the fastest way to get the next item to process, and don’t care what item it is or how long its been waiting. The way that the ConcurrentBag works is to take advantage of the new ThreadLocal<T> type (new in System.Threading for .NET 4.0) so that each thread using the bag has a list local to just that thread.  This means that adding or removing to a thread-local list requires very low synchronization.  The problem comes in where a thread goes to consume an item but it’s local list is empty.  In this case the bag performs “work-stealing” where it will rob an item from another thread that has items in its list.  This requires a higher level of synchronization which adds a bit of overhead to the take operation. So, as you can imagine, this makes the ConcurrentBag good for situations where each thread both produces and consumes items from the bag, but it would be less-than-idea in situations where some threads are dedicated producers and the other threads are dedicated consumers because the work-stealing synchronization would outweigh the thread-local optimization for a thread taking its own items. Like the other concurrent collections, there are some curiosities to keep in mind: IsEmpty(), Count, ToArray(), and GetEnumerator() lock collection Each of these needs to take a snapshot of whole bag to determine if empty, thus they tend to be more expensive and cause Add() and Take() operations to block. ToArray() and GetEnumerator() are static snapshots Because it is based on a snapshot, will not show subsequent updates after snapshot. Add() is lightweight Since adding to the thread-local list, there is very little overhead on Add. TryTake() is lightweight if items in thread-local list As long as items are in the thread-local list, TryTake() is very lightweight, much more so than ConcurrentStack() and ConcurrentQueue(), however if the local thread list is empty, it must steal work from another thread, which is more expensive. Remember, a bag is not ideal for all situations, it is mainly ideal for situations where a process consumes an item and either decomposes it into more items to be processed, or handles the item partially and places it back to be processed again until some point when it will complete.  The main point is that the bag works best when each thread both takes and adds items. For example, we could create a totally contrived example where perhaps we want to see the largest power of a number before it crosses a certain threshold.  Yes, obviously we could easily do this with a log function, but bare with me while I use this contrived example for simplicity. So let’s say we have a work function that will take a Tuple out of a bag, this Tuple will contain two ints.  The first int is the original number, and the second int is the last multiple of that number.  So we could load our bag with the initial values (let’s say we want to know the last multiple of each of 2, 3, 5, and 7 under 100. 1: var bag = new ConcurrentBag<Tuple<int, int>> 2: { 3: Tuple.Create(2, 1), 4: Tuple.Create(3, 1), 5: Tuple.Create(5, 1), 6: Tuple.Create(7, 1) 7: }; Then we can create a method that given the bag, will take out an item, apply the multiplier again, 1: public static void FindHighestPowerUnder(ConcurrentBag<Tuple<int,int>> bag, int threshold) 2: { 3: Tuple<int,int> pair; 4:  5: // while there are items to take, this will prefer local first, then steal if no local 6: while (bag.TryTake(out pair)) 7: { 8: // look at next power 9: var result = Math.Pow(pair.Item1, pair.Item2 + 1); 10:  11: if (result < threshold) 12: { 13: // if smaller than threshold bump power by 1 14: bag.Add(Tuple.Create(pair.Item1, pair.Item2 + 1)); 15: } 16: else 17: { 18: // otherwise, we're done 19: Console.WriteLine("Highest power of {0} under {3} is {0}^{1} = {2}.", 20: pair.Item1, pair.Item2, Math.Pow(pair.Item1, pair.Item2), threshold); 21: } 22: } 23: } Now that we have this, we can load up this method as an Action into our Tasks and run it: 1: // create array of tasks, start all, wait for all 2: var tasks = new[] 3: { 4: new Task(() => FindHighestPowerUnder(bag, 100)), 5: new Task(() => FindHighestPowerUnder(bag, 100)), 6: }; 7:  8: Array.ForEach(tasks, t => t.Start()); 9:  10: Task.WaitAll(tasks); Totally contrived, I know, but keep in mind the main point!  When you have a thread or task that operates on an item, and then puts it back for further consumption – or decomposes an item into further sub-items to be processed – you should consider a ConcurrentBag as the thread-local lists will allow for quick processing.  However, if you need ordering or if your processes are dedicated producers or consumers, this collection is not ideal.  As with anything, you should performance test as your mileage will vary depending on your situation! BlockingCollection<T> – A producers & consumers pattern collection The BlockingCollection<T> can be treated like a collection in its own right, but in reality it adds a producers and consumers paradigm to any collection that implements the interface IProducerConsumerCollection<T>.  If you don’t specify one at the time of construction, it will use a ConcurrentQueue<T> as its underlying store. If you don’t want to use the ConcurrentQueue, the ConcurrentStack and ConcurrentBag also implement the interface (though ConcurrentDictionary does not).  In addition, you are of course free to create your own implementation of the interface. So, for those who don’t remember the producers and consumers classical computer-science problem, the gist of it is that you have one (or more) processes that are creating items (producers) and one (or more) processes that are consuming these items (consumers).  Now, the crux of the problem is that there is a bin (queue) where the produced items are placed, and typically that bin has a limited size.  Thus if a producer creates an item, but there is no space to store it, it must wait until an item is consumed.  Also if a consumer goes to consume an item and none exists, it must wait until an item is produced. The BlockingCollection makes it trivial to implement any standard producers/consumers process set by providing that “bin” where the items can be produced into and consumed from with the appropriate blocking operations.  In addition, you can specify whether the bin should have a limited size or can be (theoretically) unbounded, and you can specify timeouts on the blocking operations. As far as your choice of “bin”, for the most part the ConcurrentQueue is the right choice because it is fairly light and maximizes fairness by ordering items so that they are consumed in the same order they are produced.  You can use the concurrent bag or stack, of course, but your ordering would be random-ish in the case of the former and LIFO in the case of the latter. So let’s look at some of the methods of note in BlockingCollection: BoundedCapacity returns capacity of the “bin” If the bin is unbounded, the capacity is int.MaxValue. Count returns an internally-kept count of items This makes it O(1), but if you modify underlying collection directly (not recommended) it is unreliable. CompleteAdding() is used to cut off further adds. This sets IsAddingCompleted and begins to wind down consumers once empty. IsAddingCompleted is true when producers are “done”. Once you are done producing, should complete the add process to alert consumers. IsCompleted is true when producers are “done” and “bin” is empty. Once you mark the producers done, and all items removed, this will be true. Add() is a blocking add to collection. If bin is full, will wait till space frees up Take() is a blocking remove from collection. If bin is empty, will wait until item is produced or adding is completed. GetConsumingEnumerable() is used to iterate and consume items. Unlike the standard enumerator, this one consumes the items instead of iteration. TryAdd() attempts add but does not block completely If adding would block, returns false instead, can specify TimeSpan to wait before stopping. TryTake() attempts to take but does not block completely Like TryAdd(), if taking would block, returns false instead, can specify TimeSpan to wait. Note the use of CompleteAdding() to signal the BlockingCollection that nothing else should be added.  This means that any attempts to TryAdd() or Add() after marked completed will throw an InvalidOperationException.  In addition, once adding is complete you can still continue to TryTake() and Take() until the bin is empty, and then Take() will throw the InvalidOperationException and TryTake() will return false. So let’s create a simple program to try this out.  Let’s say that you have one process that will be producing items, but a slower consumer process that handles them.  This gives us a chance to peek inside what happens when the bin is bounded (by default, the bin is NOT bounded). 1: var bin = new BlockingCollection<int>(5); Now, we create a method to produce items: 1: public static void ProduceItems(BlockingCollection<int> bin, int numToProduce) 2: { 3: for (int i = 0; i < numToProduce; i++) 4: { 5: // try for 10 ms to add an item 6: while (!bin.TryAdd(i, TimeSpan.FromMilliseconds(10))) 7: { 8: Console.WriteLine("Bin is full, retrying..."); 9: } 10: } 11:  12: // once done producing, call CompleteAdding() 13: Console.WriteLine("Adding is completed."); 14: bin.CompleteAdding(); 15: } And one to consume them: 1: public static void ConsumeItems(BlockingCollection<int> bin) 2: { 3: // This will only be true if CompleteAdding() was called AND the bin is empty. 4: while (!bin.IsCompleted) 5: { 6: int item; 7:  8: if (!bin.TryTake(out item, TimeSpan.FromMilliseconds(10))) 9: { 10: Console.WriteLine("Bin is empty, retrying..."); 11: } 12: else 13: { 14: Console.WriteLine("Consuming item {0}.", item); 15: Thread.Sleep(TimeSpan.FromMilliseconds(20)); 16: } 17: } 18: } Then we can fire them off: 1: // create one producer and two consumers 2: var tasks = new[] 3: { 4: new Task(() => ProduceItems(bin, 20)), 5: new Task(() => ConsumeItems(bin)), 6: new Task(() => ConsumeItems(bin)), 7: }; 8:  9: Array.ForEach(tasks, t => t.Start()); 10:  11: Task.WaitAll(tasks); Notice that the producer is faster than the consumer, thus it should be hitting a full bin often and displaying the message after it times out on TryAdd(). 1: Consuming item 0. 2: Consuming item 1. 3: Bin is full, retrying... 4: Bin is full, retrying... 5: Consuming item 3. 6: Consuming item 2. 7: Bin is full, retrying... 8: Consuming item 4. 9: Consuming item 5. 10: Bin is full, retrying... 11: Consuming item 6. 12: Consuming item 7. 13: Bin is full, retrying... 14: Consuming item 8. 15: Consuming item 9. 16: Bin is full, retrying... 17: Consuming item 10. 18: Consuming item 11. 19: Bin is full, retrying... 20: Consuming item 12. 21: Consuming item 13. 22: Bin is full, retrying... 23: Bin is full, retrying... 24: Consuming item 14. 25: Adding is completed. 26: Consuming item 15. 27: Consuming item 16. 28: Consuming item 17. 29: Consuming item 19. 30: Consuming item 18. Also notice that once CompleteAdding() is called and the bin is empty, the IsCompleted property returns true, and the consumers will exit. Summary The ConcurrentBag is an interesting collection that can be used to optimize concurrency scenarios where tasks or threads both produce and consume items.  In this way, it will choose to consume its own work if available, and then steal if not.  However, in situations where you want fair consumption or ordering, or in situations where the producers and consumers are distinct processes, the bag is not optimal. The BlockingCollection is a great wrapper around all of the concurrent queue, stack, and bag that allows you to add producer and consumer semantics easily including waiting when the bin is full or empty. That’s the end of my dive into the concurrent collections.  I’d also strongly recommend, once again, you read this excellent Microsoft white paper that goes into much greater detail on the efficiencies you can gain using these collections judiciously (here). Tweet Technorati Tags: C#,.NET,Concurrent Collections,Little Wonders

    Read the article

  • C#/.NET Little Wonders: Fun With Enum Methods

    - by James Michael Hare
    Once again lets dive into the Little Wonders of .NET, those small things in the .NET languages and BCL classes that make development easier by increasing readability, maintainability, and/or performance. So probably every one of us has used an enumerated type at one time or another in a C# program.  The enumerated types we create are a great way to represent that a value can be one of a set of discrete values (or a combination of those values in the case of bit flags). But the power of enum types go far beyond simple assignment and comparison, there are many methods in the Enum class (that all enum types “inherit” from) that can give you even more power when dealing with them. IsDefined() – check if a given value exists in the enum Are you reading a value for an enum from a data source, but are unsure if it is actually a valid value or not?  Casting won’t tell you this, and Parse() isn’t guaranteed to balk either if you give it an int or a combination of flags.  So what can we do? Let’s assume we have a small enum like this for result codes we want to return back from our business logic layer: 1: public enum ResultCode 2: { 3: Success, 4: Warning, 5: Error 6: } In this enum, Success will be zero (unless given another value explicitly), Warning will be one, and Error will be two. So what happens if we have code like this where perhaps we’re getting the result code from another data source (could be database, could be web service, etc)? 1: public ResultCode PerformAction() 2: { 3: // set up and call some method that returns an int. 4: int result = ResultCodeFromDataSource(); 5:  6: // this will suceed even if result is < 0 or > 2. 7: return (ResultCode) result; 8: } So what happens if result is –1 or 4?  Well, the cast does not fail, so what we end up with would be an instance of a ResultCode that would have a value that’s outside of the bounds of the enum constants we defined. This means if you had a block of code like: 1: switch (result) 2: { 3: case ResultType.Success: 4: // do success stuff 5: break; 6:  7: case ResultType.Warning: 8: // do warning stuff 9: break; 10:  11: case ResultType.Error: 12: // do error stuff 13: break; 14: } That you would hit none of these blocks (which is a good argument for always having a default in a switch by the way). So what can you do?  Well, there is a handy static method called IsDefined() on the Enum class which will tell you if an enum value is defined.  1: public ResultCode PerformAction() 2: { 3: int result = ResultCodeFromDataSource(); 4:  5: if (!Enum.IsDefined(typeof(ResultCode), result)) 6: { 7: throw new InvalidOperationException("Enum out of range."); 8: } 9:  10: return (ResultCode) result; 11: } In fact, this is often recommended after you Parse() or cast a value to an enum as there are ways for values to get past these methods that may not be defined. If you don’t like the syntax of passing in the type of the enum, you could clean it up a bit by creating an extension method instead that would allow you to call IsDefined() off any isntance of the enum: 1: public static class EnumExtensions 2: { 3: // helper method that tells you if an enum value is defined for it's enumeration 4: public static bool IsDefined(this Enum value) 5: { 6: return Enum.IsDefined(value.GetType(), value); 7: } 8: }   HasFlag() – an easier way to see if a bit (or bits) are set Most of us who came from the land of C programming have had to deal extensively with bit flags many times in our lives.  As such, using bit flags may be almost second nature (for a quick refresher on bit flags in enum types see one of my old posts here). However, in higher-level languages like C#, the need to manipulate individual bit flags is somewhat diminished, and the code to check for bit flag enum values may be obvious to an advanced developer but cryptic to a novice developer. For example, let’s say you have an enum for a messaging platform that contains bit flags: 1: // usually, we pluralize flags enum type names 2: [Flags] 3: public enum MessagingOptions 4: { 5: None = 0, 6: Buffered = 0x01, 7: Persistent = 0x02, 8: Durable = 0x04, 9: Broadcast = 0x08 10: } We can combine these bit flags using the bitwise OR operator (the ‘|’ pipe character): 1: // combine bit flags using 2: var myMessenger = new Messenger(MessagingOptions.Buffered | MessagingOptions.Broadcast); Now, if we wanted to check the flags, we’d have to test then using the bit-wise AND operator (the ‘&’ character): 1: if ((options & MessagingOptions.Buffered) == MessagingOptions.Buffered) 2: { 3: // do code to set up buffering... 4: // ... 5: } While the ‘|’ for combining flags is easy enough to read for advanced developers, the ‘&’ test tends to be easy for novice developers to get wrong.  First of all you have to AND the flag combination with the value, and then typically you should test against the flag combination itself (and not just for a non-zero)!  This is because the flag combination you are testing with may combine multiple bits, in which case if only one bit is set, the result will be non-zero but not necessarily all desired bits! Thanks goodness in .NET 4.0 they gave us the HasFlag() method.  This method can be called from an enum instance to test to see if a flag is set, and best of all you can avoid writing the bit wise logic yourself.  Not to mention it will be more readable to a novice developer as well: 1: if (options.HasFlag(MessagingOptions.Buffered)) 2: { 3: // do code to set up buffering... 4: // ... 5: } It is much more concise and unambiguous, thus increasing your maintainability and readability. It would be nice to have a corresponding SetFlag() method, but unfortunately generic types don’t allow you to specialize on Enum, which makes it a bit more difficult.  It can be done but you have to do some conversions to numeric and then back to the enum which makes it less of a payoff than having the HasFlag() method.  But if you want to create it for symmetry, it would look something like this: 1: public static T SetFlag<T>(this Enum value, T flags) 2: { 3: if (!value.GetType().IsEquivalentTo(typeof(T))) 4: { 5: throw new ArgumentException("Enum value and flags types don't match."); 6: } 7:  8: // yes this is ugly, but unfortunately we need to use an intermediate boxing cast 9: return (T)Enum.ToObject(typeof (T), Convert.ToUInt64(value) | Convert.ToUInt64(flags)); 10: } Note that since the enum types are value types, we need to assign the result to something (much like string.Trim()).  Also, you could chain several SetFlag() operations together or create one that takes a variable arg list if desired. Parse() and ToString() – transitioning from string to enum and back Sometimes, you may want to be able to parse an enum from a string or convert it to a string - Enum has methods built in to let you do this.  Now, many may already know this, but may not appreciate how much power are in these two methods. For example, if you want to parse a string as an enum, it’s easy and works just like you’d expect from the numeric types: 1: string optionsString = "Persistent"; 2:  3: // can use Enum.Parse, which throws if finds something it doesn't like... 4: var result = (MessagingOptions)Enum.Parse(typeof (MessagingOptions), optionsString); 5:  6: if (result == MessagingOptions.Persistent) 7: { 8: Console.WriteLine("It worked!"); 9: } Note that Enum.Parse() will throw if it finds a value it doesn’t like.  But the values it likes are fairly flexible!  You can pass in a single value, or a comma separated list of values for flags and it will parse them all and set all bits: 1: // for string values, can have one, or comma separated. 2: string optionsString = "Persistent, Buffered"; 3:  4: var result = (MessagingOptions)Enum.Parse(typeof (MessagingOptions), optionsString); 5:  6: if (result.HasFlag(MessagingOptions.Persistent) && result.HasFlag(MessagingOptions.Buffered)) 7: { 8: Console.WriteLine("It worked!"); 9: } Or you can parse in a string containing a number that represents a single value or combination of values to set: 1: // 3 is the combination of Buffered (0x01) and Persistent (0x02) 2: var optionsString = "3"; 3:  4: var result = (MessagingOptions) Enum.Parse(typeof (MessagingOptions), optionsString); 5:  6: if (result.HasFlag(MessagingOptions.Persistent) && result.HasFlag(MessagingOptions.Buffered)) 7: { 8: Console.WriteLine("It worked again!"); 9: } And, if you really aren’t sure if the parse will work, and don’t want to handle an exception, you can use TryParse() instead: 1: string optionsString = "Persistent, Buffered"; 2: MessagingOptions result; 3:  4: // try parse returns true if successful, and takes an out parm for the result 5: if (Enum.TryParse(optionsString, out result)) 6: { 7: if (result.HasFlag(MessagingOptions.Persistent) && result.HasFlag(MessagingOptions.Buffered)) 8: { 9: Console.WriteLine("It worked!"); 10: } 11: } So we covered parsing a string to an enum, what about reversing that and converting an enum to a string?  The ToString() method is the obvious and most basic choice for most of us, but did you know you can pass a format string for enum types that dictate how they are written as a string?: 1: MessagingOptions value = MessagingOptions.Buffered | MessagingOptions.Persistent; 2:  3: // general format, which is the default, 4: Console.WriteLine("Default : " + value); 5: Console.WriteLine("G (default): " + value.ToString("G")); 6:  7: // Flags format, even if type does not have Flags attribute. 8: Console.WriteLine("F (flags) : " + value.ToString("F")); 9:  10: // integer format, value as number. 11: Console.WriteLine("D (num) : " + value.ToString("D")); 12:  13: // hex format, value as hex 14: Console.WriteLine("X (hex) : " + value.ToString("X")); Which displays: 1: Default : Buffered, Persistent 2: G (default): Buffered, Persistent 3: F (flags) : Buffered, Persistent 4: D (num) : 3 5: X (hex) : 00000003 Now, you may not really see a difference here between G and F because I used a [Flags] enum, the difference is that the “F” option treats the enum as if it were flags even if the [Flags] attribute is not present.  Let’s take a non-flags enum like the ResultCode used earlier: 1: // yes, we can do this even if it is not [Flags] enum. 2: ResultCode value = ResultCode.Warning | ResultCode.Error; And if we run that through the same formats again we get: 1: Default : 3 2: G (default): 3 3: F (flags) : Warning, Error 4: D (num) : 3 5: X (hex) : 00000003 Notice that since we had multiple values combined, but it was not a [Flags] marked enum, the G and default format gave us a number instead of a value name.  This is because the value was not a valid single-value constant of the enum.  However, using the F flags format string, it broke out the value into its component flags even though it wasn’t marked [Flags]. So, if you want to get an enum to display appropriately for whether or not it has the [Flags] attribute, use G which is the default.  If you always want it to attempt to break down the flags, use F.  For numeric output, obviously D or  X are the best choice depending on whether you want decimal or hex. Summary Hopefully, you learned a couple of new tricks with using the Enum class today!  I’ll add more little wonders as I think of them and thanks for all the invaluable input!   Technorati Tags: C#,.NET,Little Wonders,Enum,BlackRabbitCoder

    Read the article

  • Review: A Quick Look at Reflector

    - by James Michael Hare
    I, like many, was disappointed when I heard that Reflector 7 was not free, and perhaps that’s why I waited so long to try it and just kept using my version 6 (which continues to be free).  But though I resisted for so long, I longed for the better features that were being developed, and began to wonder if I should upgrade.  Thus, I began to look into the features being offered in Reflector 7.5 to see what was new. Multiple Editions Reflector 7.5 comes in three flavors, each building on the features of the previous version: Standard – Contains just the Standalone application ($70) VS – Same as Standard but adds Reflector Object Browser for Visual Studio ($130) VSPro – Same as VS but adds ability to set breakpoints and step into decompiled code ($190) So let’s examine each of these features. The Standalone Application (Standard, VS, VSPro editions) Popping open Reflector 7.5 and looking at the GUI, we see much of the same familiar features, with a few new ones as well: Most notably, the disassembler window now has a tabbed window with navigation buttons.  This makes it much easier to back out of a deep-dive into many layers of decompiled code back to a previous point. Also, there is now an analyzer which can be used to determine dependencies for a given method, property, type, etc. For example, if we select System.Net.Sockets.TcpClient and hit the Analyze button, we’d see a window with the following nodes we could expand: This gives us the ability to see what a given type uses, what uses it, who exposes it, and who instantiates it. Now obviously, for low-level types (like DateTime) this list would be enormous, but this can give a lot of information on how a given type is connected to the larger code ecosystem. One of the other things I like about using Reflector 7.5 is that it does a much better job of displaying iterator blocks than Reflector 6 did. For example, if you were to take a look at the Enumerable.Cast() extension method in System.Linq, and dive into the CastIterator in Reflector 6, you’d see this: But now, in Reflector 7.5, we see the iterator logic much more clearly: This is a big improvement in the quality of their code disassembler and for me was one of the main reasons I decided to take the plunge and get version 7.5. The Reflector Object Browser (VS, VSPro editions) If you have the .NET Reflector VS or VSPro editions, you’ll find you have in Visual Studio a Reflector Object Browser window available where you can select and decompile any assembly right in Visual Studio. For example, if you want to take a peek at how System.Collections.Generic.List<T> works, you can either select List<T> in the Reflector Object Browser, or even simpler just select a usage of it in your code and CTRL + Click to dive in. – And it takes you right to a source window with the decompiled source: Setting Breakpoints and Stepping Into Decompiled Code (VSPro) If you have the VSPro edition, in addition to all the things said above, you also get the additional ability to set breakpoints in this decompiled code and step through it as if it were your own code: This can be a handy feature when you need to see why your code’s use of a BCL or other third-party library isn’t working as you expect. Summary Yes, Reflector is no longer free, and yes, that’s a bit of a bummer. But it always was and still is a very fine tool. If you still have Reflector 6, you aren’t forced to upgrade any longer, but getting the nicer disassembler (especially for iterator blocks) and the handy VS integration is worth at least considering upgrading for.  So I leave it up to you, these are some of the features of Reflector 7.5, what’s your thoughts? Technorati Tags: .NET,Reflector

    Read the article

  • Of C# Iterators and Performance

    - by James Michael Hare
    Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++.  Nope, I'm talking C# iterators.  No, not enumerators, iterators.   So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well.   Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do.  I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?"   Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post.  Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration?   So, to begin, let's look at a couple of methods.  This is a new (albeit contrived) method called Every(...).  The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first).  So Every(2) would return items 0, 2, 4, 6, etc.   Now, if you wanted to write this in the traditional way, you may come up with something like this:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         List<T> newList = new List<T>();         int count = 0;           foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 newList.Add(i);             }         }           return newList;     }     So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item.  Pretty straight forward.   The problem?  Well, Every<T>(...) will construct a list containing every nth item whether or not you care.  What happens if you were searching this result for a certain item and find that item after five tries?  You would have generated the rest of the list for nothing.   Enter iterators.  This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for.  This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list.   We see this all the time in Linq, where many expressions are chained together to do complex processing on a list.  This would be very expensive if each of these expressions evaluated their entire possible result set on call.    Let's look at the same example function, this time using an iterator:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         int count = 0;         foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 yield return i;             }         }     }   Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement.  What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends.   Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time.   It doesn't seem like a big deal, does it?  But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit.   This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained.    For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10.  We could write that as:       List<int> someList = new List<int>();         // fill list here         someList.Every(10).TakeWhile(i => i <= 10);     Now is the difference more apparent?  If we use the first form of Every that makes a copy of the list.  It's going to copy the entire list whether we will need those items or not, that can be costly!    With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated.   So, sounds neat eh?  But what's the cost is what you're probably wondering.  So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things.    Now, iteration isn't free.  If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead:   Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521   Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists.  In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price.  However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect.   However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead.  Let's look at what happens if you stop looking after 80% of the list:   Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010       Notice that the iterator form is now operating quite a bit faster.  But the savings really add up if you stop on average at 50% (which most searches would typically do):     Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336   Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time.  And this is only for one level deep.  If we are using this in a chain of query expressions it only adds to the savings.   So my recommendation?  If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator.  The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains.   Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

    Read the article

  • C#/.NET Little Wonders: The Predicate, Comparison, and Converter Generic Delegates

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. In the last three weeks, we examined the Action family of delegates (and delegates in general), the Func family of delegates, and the EventHandler family of delegates and how they can be used to support generic, reusable algorithms and classes. This week I will be completing my series on the generic delegates in the .NET Framework with a discussion of three more, somewhat less used, generic delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. These are older generic delegates that were introduced in .NET 2.0, mostly for use in the Array and List<T> classes.  Though older, it’s good to have an understanding of them and their intended purpose.  In addition, you can feel free to use them yourself, though obviously you can also use the equivalents from the Func family of delegates instead. Predicate<T> – delegate for determining matches The Predicate<T> delegate was a very early delegate developed in the .NET 2.0 Framework to determine if an item was a match for some condition in a List<T> or T[].  The methods that tend to use the Predicate<T> include: Find(), FindAll(), FindLast() Uses the Predicate<T> delegate to finds items, in a list/array of type T, that matches the given predicate. FindIndex(), FindLastIndex() Uses the Predicate<T> delegate to find the index of an item, of in a list/array of type T, that matches the given predicate. The signature of the Predicate<T> delegate (ignoring variance for the moment) is: 1: public delegate bool Predicate<T>(T obj); So, this is a delegate type that supports any method taking an item of type T and returning bool.  In addition, there is a semantic understanding that this predicate is supposed to be examining the item supplied to see if it matches a given criteria. 1: // finds first even number (2) 2: var firstEven = Array.Find(numbers, n => (n % 2) == 0); 3:  4: // finds all odd numbers (1, 3, 5, 7, 9) 5: var allEvens = Array.FindAll(numbers, n => (n % 2) == 1); 6:  7: // find index of first multiple of 5 (4) 8: var firstFiveMultiplePos = Array.FindIndex(numbers, n => (n % 5) == 0); This delegate has typically been succeeded in LINQ by the more general Func family, so that Predicate<T> and Func<T, bool> are logically identical.  Strictly speaking, though, they are different types, so a delegate reference of type Predicate<T> cannot be directly assigned to a delegate reference of type Func<T, bool>, though the same method can be assigned to both. 1: // SUCCESS: the same lambda can be assigned to either 2: Predicate<DateTime> isSameDayPred = dt => dt.Date == DateTime.Today; 3: Func<DateTime, bool> isSameDayFunc = dt => dt.Date == DateTime.Today; 4:  5: // ERROR: once they are assigned to a delegate type, they are strongly 6: // typed and cannot be directly assigned to other delegate types. 7: isSameDayPred = isSameDayFunc; When you assign a method to a delegate, all that is required is that the signature matches.  This is why the same method can be assigned to either delegate type since their signatures are the same.  However, once the method has been assigned to a delegate type, it is now a strongly-typed reference to that delegate type, and it cannot be assigned to a different delegate type (beyond the bounds of variance depending on Framework version, of course). Comparison<T> – delegate for determining order Just as the Predicate<T> generic delegate was birthed to give Array and List<T> the ability to perform type-safe matching, the Comparison<T> was birthed to give them the ability to perform type-safe ordering. The Comparison<T> is used in Array and List<T> for: Sort() A form of the Sort() method that takes a comparison delegate; this is an alternate way to custom sort a list/array from having to define custom IComparer<T> classes. The signature for the Comparison<T> delegate looks like (without variance): 1: public delegate int Comparison<T>(T lhs, T rhs); The goal of this delegate is to compare the left-hand-side to the right-hand-side and return a negative number if the lhs < rhs, zero if they are equal, and a positive number if the lhs > rhs.  Generally speaking, null is considered to be the smallest value of any reference type, so null should always be less than non-null, and two null values should be considered equal. In most sort/ordering methods, you must specify an IComparer<T> if you want to do custom sorting/ordering.  The Array and List<T> types, however, also allow for an alternative Comparison<T> delegate to be used instead, essentially, this lets you perform the custom sort without having to have the custom IComparer<T> class defined. It should be noted, however, that the LINQ OrderBy(), and ThenBy() family of methods do not support the Comparison<T> delegate (though one could easily add their own extension methods to create one, or create an IComparer() factory class that generates one from a Comparison<T>). So, given this delegate, we could use it to perform easy sorts on an Array or List<T> based on custom fields.  Say for example we have a data class called Employee with some basic employee information: 1: public sealed class Employee 2: { 3: public string Name { get; set; } 4: public int Id { get; set; } 5: public double Salary { get; set; } 6: } And say we had a List<Employee> that contained data, such as: 1: var employees = new List<Employee> 2: { 3: new Employee { Name = "John Smith", Id = 2, Salary = 37000.0 }, 4: new Employee { Name = "Jane Doe", Id = 1, Salary = 57000.0 }, 5: new Employee { Name = "John Doe", Id = 5, Salary = 60000.0 }, 6: new Employee { Name = "Jane Smith", Id = 3, Salary = 59000.0 } 7: }; Now, using the Comparison<T> delegate form of Sort() on the List<Employee>, we can sort our list many ways: 1: // sort based on employee ID 2: employees.Sort((lhs, rhs) => Comparer<int>.Default.Compare(lhs.Id, rhs.Id)); 3:  4: // sort based on employee name 5: employees.Sort((lhs, rhs) => string.Compare(lhs.Name, rhs.Name)); 6:  7: // sort based on salary, descending (note switched lhs/rhs order for descending) 8: employees.Sort((lhs, rhs) => Comparer<double>.Default.Compare(rhs.Salary, lhs.Salary)); So again, you could use this older delegate, which has a lot of logical meaning to it’s name, or use a generic delegate such as Func<T, T, int> to implement the same sort of behavior.  All this said, one of the reasons, in my opinion, that Comparison<T> isn’t used too often is that it tends to need complex lambdas, and the LINQ ability to order based on projections is much easier to use, though the Array and List<T> sorts tend to be more efficient if you want to perform in-place ordering. Converter<TInput, TOutput> – delegate to convert elements The Converter<TInput, TOutput> delegate is used by the Array and List<T> delegate to specify how to convert elements from an array/list of one type (TInput) to another type (TOutput).  It is used in an array/list for: ConvertAll() Converts all elements from a List<TInput> / TInput[] to a new List<TOutput> / TOutput[]. The delegate signature for Converter<TInput, TOutput> is very straightforward (ignoring variance): 1: public delegate TOutput Converter<TInput, TOutput>(TInput input); So, this delegate’s job is to taken an input item (of type TInput) and convert it to a return result (of type TOutput).  Again, this is logically equivalent to a newer Func delegate with a signature of Func<TInput, TOutput>.  In fact, the latter is how the LINQ conversion methods are defined. So, we could use the ConvertAll() syntax to convert a List<T> or T[] to different types, such as: 1: // get a list of just employee IDs 2: var empIds = employees.ConvertAll(emp => emp.Id); 3:  4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.ConvertAll(emp => (int)emp.Salary); Note that the expressions above are logically equivalent to using LINQ’s Select() method, which gives you a lot more power: 1: // get a list of just employee IDs 2: var empIds = employees.Select(emp => emp.Id).ToList(); 3:  4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.Select(emp => (int)emp.Salary).ToList(); The only difference with using LINQ is that many of the methods (including Select()) are deferred execution, which means that often times they will not perform the conversion for an item until it is requested.  This has both pros and cons in that you gain the benefit of not performing work until it is actually needed, but on the flip side if you want the results now, there is overhead in the behind-the-scenes work that support deferred execution (it’s supported by the yield return / yield break keywords in C# which define iterators that maintain current state information). In general, the new LINQ syntax is preferred, but the older Array and List<T> ConvertAll() methods are still around, as is the Converter<TInput, TOutput> delegate. Sidebar: Variance support update in .NET 4.0 Just like our descriptions of Func and Action, these three early generic delegates also support more variance in assignment as of .NET 4.0.  Their new signatures are: 1: // comparison is contravariant on type being compared 2: public delegate int Comparison<in T>(T lhs, T rhs); 3:  4: // converter is contravariant on input and covariant on output 5: public delegate TOutput Contravariant<in TInput, out TOutput>(TInput input); 6:  7: // predicate is contravariant on input 8: public delegate bool Predicate<in T>(T obj); Thus these delegates can now be assigned to delegates allowing for contravariance (going to a more derived type) or covariance (going to a less derived type) based on whether the parameters are input or output, respectively. Summary Today, we wrapped up our generic delegates discussion by looking at three lesser-used delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>.  All three of these tend to be replaced by their more generic Func equivalents in LINQ, but that doesn’t mean you shouldn’t understand what they do or can’t use them for your own code, as they do contain semantic meanings in their names that sometimes get lost in the more generic Func name.   Tweet Technorati Tags: C#,CSharp,.NET,Little Wonders,delegates,generics,Predicate,Converter,Comparison

    Read the article

  • C++ Little Wonders: The C++11 auto keyword redux

    - by James Michael Hare
    I’ve decided to create a sub-series of my Little Wonders posts to focus on C++.  Just like their C# counterparts, these posts will focus on those features of the C++ language that can help improve code by making it easier to write and maintain.  The index of the C# Little Wonders can be found here. This has been a busy week with a rollout of some new website features here at my work, so I don’t have a big post for this week.  But I wanted to write something up, and since lately I’ve been renewing my C++ skills in a separate project, it seemed like a good opportunity to start a C++ Little Wonders series.  Most of my development work still tends to focus on C#, but it was great to get back into the saddle and renew my C++ knowledge.  Today I’m going to focus on a new feature in C++11 (formerly known as C++0x, which is a major move forward in the C++ language standard).  While this small keyword can seem so trivial, I feel it is a big step forward in improving readability in C++ programs. The auto keyword If you’ve worked on C++ for a long time, you probably have some passing familiarity with the old auto keyword as one of those rarely used C++ keywords that was almost never used because it was the default. That is, in the code below (before C++11): 1: int foo() 2: { 3: // automatic variables (allocated and deallocated on stack) 4: int x; 5: auto int y; 6:  7: // static variables (retain their value across calls) 8: static int z; 9:  10: return 0; 11: } The variable x is assumed to be auto because that is the default, thus it is unnecessary to specify it explicitly as in the declaration of y below that.  Basically, an auto variable is one that is allocated and de-allocated on the stack automatically.  Contrast this to static variables, that are allocated statically and exist across the lifetime of the program. Because auto was so rarely (if ever) used since it is the norm, they decided to remove it for this purpose and give it new meaning in C++11.  The new meaning of auto: implicit typing Now, if your compiler supports C++ 11 (or at least a good subset of C++11 or 0x) you can take advantage of type inference in C++.  For those of you from the C# world, this means that the auto keyword in C++ now behaves a lot like the var keyword in C#! For example, many of us have had to declare those massive type declarations for an iterator before.  Let’s say we have a std::map of std::string to int which will map names to ages: 1: std::map<std::string, int> myMap; And then let’s say we want to find the age of a given person: 1: // Egad that's a long type... 2: std::map<std::string, int>::const_iterator pos = myMap.find(targetName); Notice that big ugly type definition to declare variable pos?  Sure, we could shorten this by creating a typedef of our specific map type if we wanted, but now with the auto keyword there’s no need: 1: // much shorter! 2: auto pos = myMap.find(targetName); The auto now tells the compiler to determine what type pos should be based on what it’s being assigned to.  This is not dynamic typing, it still determines the type as if it were explicitly declared and once declared that type cannot be changed.  That is, this is invalid: 1: // x is type int 2: auto x = 42; 3:  4: // can't assign string to int 5: x = "Hello"; Once the compiler determines x is type int it is exactly as if we typed int x = 42; instead, so don’t' confuse it with dynamic typing, it’s still very type-safe. An interesting feature of the auto keyword is that you can modify the inferred type: 1: // declare method that returns int* 2: int* GetPointer(); 3:  4: // p1 is int*, auto inferred type is int 5: auto *p1 = GetPointer(); 6:  7: // ps is int*, auto inferred type is int* 8: auto p2 = GetPointer(); Notice in both of these cases, p1 and p2 are determined to be int* but in each case the inferred type was different.  because we declared p1 as auto *p1 and GetPointer() returns int*, it inferred the type int was needed to complete the declaration.  In the second case, however, we declared p2 as auto p2 which means the inferred type was int*.  Ultimately, this make p1 and p2 the same type, but which type is inferred makes a difference, if you are chaining multiple inferred declarations together.  In these cases, the inferred type of each must match the first: 1: // Type inferred is int 2: // p1 is int* 3: // p2 is int 4: // p3 is int& 5: auto *p1 = GetPointer(), p2 = 42, &p3 = p2; Note that this works because the inferred type was int, if the inferred type was int* instead: 1: // syntax error, p1 was inferred to be int* so p2 and p3 don't make sense 2: auto p1 = GetPointer(), p2 = 42, &p3 = p2; You could also use const or static to modify the inferred type: 1: // inferred type is an int, theAnswer is a const int 2: const auto theAnswer = 42; 3:  4: // inferred type is double, Pi is a static double 5: static auto Pi = 3.1415927; Thus in the examples above it inferred the types int and double respectively, which were then modified to const and static. Summary The auto keyword has gotten new life in C++11 to allow you to infer the type of a variable from it’s initialization.  This simple little keyword can be used to cut down large declarations for complex types into a much more readable form, where appropriate.   Technorati Tags: C++, C++11, Little Wonders, auto

    Read the article

  • C#/.NET Little Wonders: The Timeout static class

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. When I started the “Little Wonders” series, I really wanted to pay homage to parts of the .NET Framework that are often small but can help in big ways.  The item I have to discuss today really is a very small item in the .NET BCL, but once again I feel it can help make the intention of code much clearer and thus is worthy of note. The Problem - Magic numbers aren’t very readable or maintainable In my first Little Wonders Post (Five Little Wonders That Make Code Better) I mention the TimeSpan factory methods which, I feel, really help the readability of constructed TimeSpan instances. Just to quickly recap that discussion, ask yourself what the TimeSpan specified in each case below is 1: // Five minutes? Five Seconds? 2: var fiveWhat1 = new TimeSpan(0, 0, 5); 3: var fiveWhat2 = new TimeSpan(0, 0, 5, 0); 4: var fiveWhat3 = new TimeSpan(0, 0, 5, 0, 0); You’d think they’d all be the same unit of time, right?  After all, most overloads tend to tack additional arguments on the end.  But this is not the case with TimeSpan, where the constructor forms are:     TimeSpan(int hours, int minutes, int seconds);     TimeSpan(int days, int hours, int minutes, int seconds);     TimeSpan(int days, int hours, int minutes, int seconds, int milliseconds); Notice how in the 4 and 5 parameter version we suddenly have the parameter days slipping in front of hours?  This can make reading constructors like those above much harder.  Fortunately, there are TimeSpan factory methods to help make your intention crystal clear: 1: // Ah! Much clearer! 2: var fiveSeconds = TimeSpan.FromSeconds(5); These are great because they remove all ambiguity from the reader!  So in short, magic numbers in constructors and methods can be ambiguous, and anything we can do to clean up the intention of the developer will make the code much easier to read and maintain. Timeout – Readable identifiers for infinite timeout values In a similar way to TimeSpan, let’s consider specifying timeouts for some of .NET’s (or our own) many methods that allow you to specify timeout periods. For example, in the TPL Task class, there is a family of Wait() methods that can take TimeSpan or int for timeouts.  Typically, if you want to specify an infinite timeout, you’d just call the version that doesn’t take a timeout parameter at all: 1: myTask.Wait(); // infinite wait But there are versions that take the int or TimeSpan for timeout as well: 1: // Wait for 100 ms 2: myTask.Wait(100); 3:  4: // Wait for 5 seconds 5: myTask.Wait(TimeSpan.FromSeconds(5); Now, if we want to specify an infinite timeout to wait on the Task, we could pass –1 (or a TimeSpan set to –1 ms), which what the .NET BCL methods with timeouts use to represent an infinite timeout: 1: // Also infinite timeouts, but harder to read/maintain 2: myTask.Wait(-1); 3: myTask.Wait(TimeSpan.FromMilliseconds(-1)); However, these are not as readable or maintainable.  If you were writing this code, you might make the mistake of thinking 0 or int.MaxValue was an infinite timeout, and you’d be incorrect.  Also, reading the code above it isn’t as clear that –1 is infinite unless you happen to know that is the specified behavior. To make the code like this easier to read and maintain, there is a static class called Timeout in the System.Threading namespace which contains definition for infinite timeouts specified as both int and TimeSpan forms: Timeout.Infinite An integer constant with a value of –1 Timeout.InfiniteTimeSpan A static readonly TimeSpan which represents –1 ms (only available in .NET 4.5+) This makes our calls to Task.Wait() (or any other calls with timeouts) much more clear: 1: // intention to wait indefinitely is quite clear now 2: myTask.Wait(Timeout.Infinite); 3: myTask.Wait(Timeout.InfiniteTimeSpan); But wait, you may say, why would we care at all?  Why not use the version of Wait() that takes no arguments?  Good question!  When you’re directly calling the method with an infinite timeout that’s what you’d most likely do, but what if you are just passing along a timeout specified by a caller from higher up?  Or perhaps storing a timeout value from a configuration file, and want to default it to infinite? For example, perhaps you are designing a communications module and want to be able to shutdown gracefully, but if you can’t gracefully finish in a specified amount of time you want to force the connection closed.  You could create a Shutdown() method in your class, and take a TimeSpan or an int for the amount of time to wait for a clean shutdown – perhaps waiting for client to acknowledge – before terminating the connection.  So, assume we had a pub/sub system with a class to broadcast messages: 1: // Some class to broadcast messages to connected clients 2: public class Broadcaster 3: { 4: // ... 5:  6: // Shutdown connection to clients, wait for ack back from clients 7: // until all acks received or timeout, whichever happens first 8: public void Shutdown(int timeout) 9: { 10: // Kick off a task here to send shutdown request to clients and wait 11: // for the task to finish below for the specified time... 12:  13: if (!shutdownTask.Wait(timeout)) 14: { 15: // If Wait() returns false, we timed out and task 16: // did not join in time. 17: } 18: } 19: } We could even add an overload to allow us to use TimeSpan instead of int, to give our callers the flexibility to specify timeouts either way: 1: // overload to allow them to specify Timeout in TimeSpan, would 2: // just call the int version passing in the TotalMilliseconds... 3: public void Shutdown(TimeSpan timeout) 4: { 5: Shutdown(timeout.TotalMilliseconds); 6: } Notice in case of this class, we don’t assume the caller wants infinite timeouts, we choose to rely on them to tell us how long to wait.  So now, if they choose an infinite timeout, they could use the –1, which is more cryptic, or use Timeout class to make the intention clear: 1: // shutdown the broadcaster, waiting until all clients ack back 2: // without timing out. 3: myBroadcaster.Shutdown(Timeout.Infinite); We could even add a default argument using the int parameter version so that specifying no arguments to Shutdown() assumes an infinite timeout: 1: // Modified original Shutdown() method to add a default of 2: // Timeout.Infinite, works because Timeout.Infinite is a compile 3: // time constant. 4: public void Shutdown(int timeout = Timeout.Infinite) 5: { 6: // same code as before 7: } Note that you can’t default the ShutDown(TimeSpan) overload with Timeout.InfiniteTimeSpan since it is not a compile-time constant.  The only acceptable default for a TimeSpan parameter would be default(TimeSpan) which is zero milliseconds, which specified no wait, not infinite wait. Summary While Timeout.Infinite and Timeout.InfiniteTimeSpan are not earth-shattering classes in terms of functionality, they do give you very handy and readable constant values that you can use in your programs to help increase readability and maintainability when specifying infinite timeouts for various timeouts in the BCL and your own applications. Technorati Tags: C#,CSharp,.NET,Little Wonders,Timeout,Task

    Read the article

  • C#/.NET &ndash; Finding an Item&rsquo;s Index in IEnumerable&lt;T&gt;

    - by James Michael Hare
    Sorry for the long blogging hiatus.  First it was, of course, the holidays hustle and bustle, then my brother and his wife gave birth to their son, so I’ve been away from my blogging for two weeks. Background: Finding an item’s index in List<T> is easy… Many times in our day to day programming activities, we want to find the index of an item in a collection.  Now, if we have a List<T> and we’re looking for the item itself this is trivial: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // can find the exact item using IndexOf() 5: var pos = list.IndexOf(64); This will return the position of the item if it’s found, or –1 if not.  It’s easy to see how this works for primitive types where equality is well defined.  For complex types, however, it will attempt to compare them using EqualityComparer<T>.Default which, in a nutshell, relies on the object’s Equals() method. So what if we want to search for a condition instead of equality?  That’s also easy in a List<T> with the FindIndex() method: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // finds index of first even number or -1 if not found. 5: var pos = list.FindIndex(i => i % 2 == 0);   Problem: Finding an item’s index in IEnumerable<T> is not so easy... This is all well and good for lists, but what if we want to do the same thing for IEnumerable<T>?  A collection of IEnumerable<T> has no indexing, so there’s no direct method to find an item’s index.  LINQ, as powerful as it is, gives us many tools to get us this information, but not in one step.  As with almost any problem involving collections, there are several ways to accomplish the same goal.  And once again as with almost any problem involving collections, the choice of the solution somewhat depends on the situation. So let’s look at a few possible alternatives.  I’m going to express each of these as extension methods for simplicity and consistency. Solution: The TakeWhile() and Count() combo One of the things you can do is to perform a TakeWhile() on the list as long as your find condition is not true, and then do a Count() of the items it took.  The only downside to this method is that if the item is not in the list, the index will be the full Count() of items, and not –1.  So if you don’t know the size of the list beforehand, this can be confusing. 1: // a collection of extra extension methods off IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item in the collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // note if item not found, result is length and not -1! 8: return list.TakeWhile(i => !finder(i)).Count(); 9: } 10: } Personally, I don’t like switching the paradigm of not found away from –1, so this is one of my least favorites.  Solution: Select with index Many people don’t realize that there is an alternative form of the LINQ Select() method that will provide you an index of the item being selected: 1: list.Select( (item,index) => do something here with the item and/or index... ) This can come in handy, but must be treated with care.  This is because the index provided is only as pertains to the result of previous operations (if any).  For example: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // you'd hope this would give you the indexes of the even numbers 5: // which would be 2, 3, 8, but in reality it gives you 0, 1, 2 6: list.Where(item => item % 2 == 0).Select((item,index) => index); The reason the example gives you the collection { 0, 1, 2 } is because the where clause passes over any items that are odd, and therefore only the even items are given to the select and only they are given indexes. Conversely, we can’t select the index and then test the item in a Where() clause, because then the Where() clause would be operating on the index and not the item! So, what we have to do is to select the item and index and put them together in an anonymous type.  It looks ugly, but it works: 1: // extensions defined on IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // finds an item in a collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // if you don't name the anonymous properties they are the variable names 8: return list.Select((item, index) => new { item, index }) 9: .Where(p => finder(p.item)) 10: .Select(p => p.index + 1) 11: .FirstOrDefault() - 1; 12: } 13: }     So let’s look at this, because i know it’s convoluted: First Select() joins the items and their indexes into an anonymous type. Where() filters that list to only the ones matching the predicate. Second Select() picks the index of the matches and adds 1 – this is to distinguish between not found and first item. FirstOrDefault() returns the first item found from the previous clauses or default (zero) if not found. Subtract one so that not found (zero) will be –1, and first item (one) will be zero. The bad thing is, this is ugly as hell and creates anonymous objects for each item tested until it finds the match.  This concerns me a bit but we’ll defer judgment until compare the relative performances below. Solution: Convert ToList() and use FindIndex() This solution is easy enough.  We know any IEnumerable<T> can be converted to List<T> using the LINQ extension method ToList(), so we can easily convert the collection to a list and then just use the FindIndex() method baked into List<T>. 1: // a collection of extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // find the index of an item in the collection similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: return list.ToList().FindIndex(finder); 8: } 9: } This solution is simplicity itself!  It is very concise and elegant and you need not worry about anyone misinterpreting what it’s trying to do (as opposed to the more convoluted LINQ methods above). But the main thing I’m concerned about here is the performance hit to allocate the List<T> in the ToList() call, but once again we’ll explore that in a second. Solution: Roll your own FindIndex() for IEnumerable<T> Of course, you can always roll your own FindIndex() method for IEnumerable<T>.  It would be a very simple for loop which scans for the item and counts as it goes.  There’s many ways to do this, but one such way might look like: 1: // extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item matching a predicate in the enumeration, much like List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: int index = 0; 8: foreach (var item in list) 9: { 10: if (finder(item)) 11: { 12: return index; 13: } 14:  15: index++; 16: } 17:  18: return -1; 19: } 20: } Well, it’s not quite simplicity, and those less familiar with LINQ may prefer it since it doesn’t include all of the lambdas and behind the scenes iterators that come with deferred execution.  But does having this long, blown out method really gain us much in performance? Comparison of Proposed Solutions So we’ve now seen four solutions, let’s analyze their collective performance.  I took each of the four methods described above and run them over 100,000 iterations of lists of size 10, 100, 1000, and 10000 and here’s the performance results.  Then I looked for targets at the begining of the list (best case), middle of the list (the average case) and not in the list (worst case as must scan all of the list). Each of the times below is the average time in milliseconds for one execution as computer over the 100,000 iterations: Searches Matching First Item (Best Case)   10 100 1000 10000 TakeWhile 0.0003 0.0003 0.0003 0.0003 Select 0.0005 0.0005 0.0005 0.0005 ToList 0.0002 0.0003 0.0013 0.0121 Manual 0.0001 0.0001 0.0001 0.0001   Searches Matching Middle Item (Average Case)   10 100 1000 10000 TakeWhile 0.0004 0.0020 0.0191 0.1889 Select 0.0008 0.0042 0.0387 0.3802 ToList 0.0002 0.0007 0.0057 0.0562 Manual 0.0002 0.0013 0.0129 0.1255   Searches Where Not Found (Worst Case)   10 100 1000 10000 TakeWhile 0.0006 0.0039 0.0381 0.3770 Select 0.0012 0.0081 0.0758 0.7583 ToList 0.0002 0.0012 0.0100 0.0996 Manual 0.0003 0.0026 0.0253 0.2514   Notice something interesting here, you’d think the “roll your own” loop would be the most efficient, but it only wins when the item is first (or very close to it) regardless of list size.  In almost all other cases though and in particular the average case and worst case, the ToList()/FindIndex() combo wins for performance, even though it is creating some temporary memory to hold the List<T>.  If you examine the algorithm, the reason why is most likely because once it’s in a ToList() form, internally FindIndex() scans the internal array which is much more efficient to iterate over.  Thus, it takes a one time performance hit (not including any GC impact) to create the List<T> but after that the performance is much better. Summary If you’re concerned about too many throw-away objects, you can always roll your own FindIndex() method, but for sheer simplicity and overall performance, using the ToList()/FindIndex() combo performs best on nearly all list sizes in the average and worst cases.    Technorati Tags: C#,.NET,Litte Wonders,BlackRabbitCoder,Software,LINQ,List

    Read the article

  • C#/.NET Little Wonders: Using &lsquo;default&rsquo; to Get Default Values

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Today’s little wonder is another of those small items that can help a lot in certain situations, especially when writing generics.  In particular, it is useful in determining what the default value of a given type would be. The Problem: what’s the default value for a generic type? There comes a time when you’re writing generic code where you may want to set an item of a given generic type.  Seems simple enough, right?  We’ll let’s see! Let’s say we want to query a Dictionary<TKey, TValue> for a given key and get back the value, but if the key doesn’t exist, we’d like a default value instead of throwing an exception. So, for example, we might have a the following dictionary defined: 1: var lookup = new Dictionary<int, string> 2: { 3: { 1, "Apple" }, 4: { 2, "Orange" }, 5: { 3, "Banana" }, 6: { 4, "Pear" }, 7: { 9, "Peach" } 8: }; And using those definitions, perhaps we want to do something like this: 1: // assume a default 2: string value = "Unknown"; 3:  4: // if the item exists in dictionary, get its value 5: if (lookup.ContainsKey(5)) 6: { 7: value = lookup[5]; 8: } But that’s inefficient, because then we’re double-hashing (once for ContainsKey() and once for the indexer).  Well, to avoid the double-hashing, we could use TryGetValue() instead: 1: string value; 2:  3: // if key exists, value will be put in value, if not default it 4: if (!lookup.TryGetValue(5, out value)) 5: { 6: value = "Unknown"; 7: } But the “flow” of using of TryGetValue() can get clunky at times when you just want to assign either the value or a default to a variable.  Essentially it’s 3-ish lines (depending on formatting) for 1 assignment.  So perhaps instead we’d like to write an extension method to support a cleaner interface that will return a default if the item isn’t found: 1: public static class DictionaryExtensions 2: { 3: public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dict, 4: TKey key, TValue defaultIfNotFound) 5: { 6: TValue value; 7:  8: // value will be the result or the default for TValue 9: if (!dict.TryGetValue(key, out value)) 10: { 11: value = defaultIfNotFound; 12: } 13:  14: return value; 15: } 16: } 17:  So this creates an extension method on Dictionary<TKey, TValue> that will attempt to get a value using the given key, and will return the defaultIfNotFound as a stand-in if the key does not exist. This code compiles, fine, but what if we would like to go one step further and allow them to specify a default if not found, or accept the default for the type?  Obviously, we could overload the method to take the default or not, but that would be duplicated code and a bit heavy for just specifying a default.  It seems reasonable that we could set the not found value to be either the default for the type, or the specified value. So what if we defaulted the type to null? 1: public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dict, 2: TKey key, TValue defaultIfNotFound = null) // ... No, this won’t work, because only reference types (and Nullable<T> wrapped types due to syntactical sugar) can be assigned to null.  So what about a calling parameterless constructor? 1: public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dict, 2: TKey key, TValue defaultIfNotFound = new TValue()) // ... No, this won’t work either for several reasons.  First, we’d expect a reference type to return null, not an “empty” instance.  Secondly, not all reference types have a parameter-less constructor (string for example does not).  And finally, a constructor cannot be determined at compile-time, while default values can. The Solution: default(T) – returns the default value for type T Many of us know the default keyword for its uses in switch statements as the default case.  But it has another use as well: it can return us the default value for a given type.  And since it generates the same defaults that default field initialization uses, it can be determined at compile-time as well. For example: 1: var x = default(int); // x is 0 2:  3: var y = default(bool); // y is false 4:  5: var z = default(string); // z is null 6:  7: var t = default(TimeSpan); // t is a TimeSpan with Ticks == 0 8:  9: var n = default(int?); // n is a Nullable<int> with HasValue == false Notice that for numeric types the default is 0, and for reference types the default is null.  In addition, for struct types, the value is a default-constructed struct – which simply means a struct where every field has their default value (hence 0 Ticks for TimeSpan, etc.). So using this, we could modify our code to this: 1: public static class DictionaryExtensions 2: { 3: public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dict, 4: TKey key, TValue defaultIfNotFound = default(TValue)) 5: { 6: TValue value; 7:  8: // value will be the result or the default for TValue 9: if (!dict.TryGetValue(key, out value)) 10: { 11: value = defaultIfNotFound; 12: } 13:  14: return value; 15: } 16: } Now, if defaultIfNotFound is unspecified, it will use default(TValue) which will be the default value for whatever value type the dictionary holds.  So let’s consider how we could use this: 1: lookup.GetValueOrDefault(1); // returns “Apple” 2:  3: lookup.GetValueOrDefault(5); // returns null 4:  5: lookup.GetValueOrDefault(5, “Unknown”); // returns “Unknown” 6:  Again, do not confuse a parameter-less constructor with the default value for a type.  Remember that the default value for any type is the compile-time default for any instance of that type (0 for numeric, false for bool, null for reference types, and struct will all default fields for struct).  Consider the difference: 1: // both zero 2: int i1 = default(int); 3: int i2 = new int(); 4:  5: // both “zeroed” structs 6: var dt1 = default(DateTime); 7: var dt2 = new DateTime(); 8:  9: // sb1 is null, sb2 is an “empty” string builder 10: var sb1 = default(StringBuilder()); 11: var sb2 = new StringBuilder(); So in the above code, notice that the value types all resolve the same whether using default or parameter-less construction.  This is because a value type is never null (even Nullable<T> wrapped types are never “null” in a reference sense), they will just by default contain fields with all default values. However, for reference types, the default is null and not a constructed instance.  Also it should be noted that not all classes have parameter-less constructors (string, for instance, doesn’t have one – and doesn’t need one). Summary Whenever you need to get the default value for a type, especially a generic type, consider using the default keyword.  This handy word will give you the default value for the given type at compile-time, which can then be used for initialization, optional parameters, etc. Technorati Tags: C#,CSharp,.NET,Little Wonders,default

    Read the article

  • C#: LINQ vs foreach - Round 1.

    - by James Michael Hare
    So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool.  But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ?    The answer is not really clear-cut.  There are two sides to any code cost arguments: performance and maintainability.  The first of these is obvious and quantifiable.  Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better.   Unfortunately, this is not always a good measure.  Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages.  In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost.   Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library.   Well, before we discuss any further, let's look at some sample code and the numbers.  First, let's take a look at the for loop and the LINQ expression.  This is just a simple find comparison:       // find implemented via LINQ     public static bool FindViaLinq(IEnumerable<int> list, int target)     {         return list.Any(item => item == target);     }         // find implemented via standard iteration     public static bool FindViaIteration(IEnumerable<int> list, int target)     {         foreach (var i in list)         {             if (i == target)             {                 return true;             }         }           return false;     }   Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention.  You don't have to actually analyze the behavior of the loop to determine what it's doing.   So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data.  For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches.                       List<T> On 100,000 Iterations     Method      Size     Total (ms)  Per Iteration (ms)  % Slower     Any         10       26          0.00046             30.00%     Iteration   10       20          0.00023             -     Any         100      116         0.00201             18.37%     Iteration   100      98          0.00118             -     Any         1000     1058        0.01853             16.78%     Iteration   1000     906         0.01155             -     Any         10,000   10,383      0.18189             17.41%     Iteration   10,000   8843        0.11362             -     Any         100,000  104,004     1.8297              18.27%     Iteration   100,000  87,941      1.13163             -   The LINQ expression is running about 17% slower for average size collections and worse for smaller collections.  Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this.   So what about other LINQ expressions?  After all, Any() is one of the more trivial ones.  I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ:       // Linq form     public static int GetTargetPosition1(IEnumerable<int> list, int target)     {         return list.TakeWhile(item => item != target).Count();     }       // traditionally iterative form     public static int GetTargetPosition2(IEnumerable<int> list, int target)     {         int count = 0;           foreach (var i in list)         {             if(i == target)             {                 break;             }               ++count;         }           return count;     }   Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt.  So I ran these through the same test data:                       List<T> On 100,000 Iterations     Method      Size     Total (ms)  Per Iteration (ms)  % Slower     TakeWhile   10       41          0.00041             128%     Iteration   10       18          0.00018             -     TakeWhile   100      171         0.00171             88%     Iteration   100      91          0.00091             -     TakeWhile   1000     1604        0.01604             94%     Iteration   1000     825         0.00825             -     TakeWhile   10,000   15765       0.15765             92%     Iteration   10,000   8204        0.08204             -     TakeWhile   100,000  156950      1.5695              92%     Iteration   100,000  81635       0.81635             -     Wow!  I expected some overhead due to the state machines iterators produce, but 90% slower?  That seems a little heavy to me.  So then I thought, well, what if TakeWhile() is not the right tool for the job?  The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem?  Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way.  This mostly matches Any().       // a new method that uses linq but evaluates the count in a closure.     public static int TakeWhileViaLinq2(IEnumerable<int> list, int target)     {         int count = 0;         list.Any(item =>             {                 if(item == target)                 {                     return true;                 }                   ++count;                 return false;             });         return count;     }     Now how does this one compare?                         List<T> On 100,000 Iterations     Method         Size     Total (ms)  Per Iteration (ms)  % Slower     TakeWhile      10       41          0.00041             128%     Any w/Closure  10       23          0.00023             28%     Iteration      10       18          0.00018             -     TakeWhile      100      171         0.00171             88%     Any w/Closure  100      116         0.00116             27%     Iteration      100      91          0.00091             -     TakeWhile      1000     1604        0.01604             94%     Any w/Closure  1000     1101        0.01101             33%     Iteration      1000     825         0.00825             -     TakeWhile      10,000   15765       0.15765             92%     Any w/Closure  10,000   10802       0.10802             32%     Iteration      10,000   8204        0.08204             -     TakeWhile      100,000  156950      1.5695              92%     Any w/Closure  100,000  108378      1.08378             33%     Iteration      100,000  81635       0.81635             -     Much better!  It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do.   So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm.  But this is true of any choice of algorithm or collection in general.     Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully.  For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>.  That's really not that bad at all in the grand scheme of things.  Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay.  And if your typical list is 1000 items or less we're talking only microseconds worth of difference.   It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off.  So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly.   If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off.  And if I ever do need that last millisecond of performance?  Well then I'll optimize JUST THAT problem spot.  To me it's worth it for the readability, speed-to-market, and maintainability.

    Read the article

  • C#: Does an IDisposable in a Halted Iterator Dispose?

    - by James Michael Hare
    If that sounds confusing, let me give you an example. Let's say you expose a method to read a database of products, and instead of returning a List<Product> you return an IEnumerable<Product> in iterator form (yield return). This accomplishes several good things: The IDataReader is not passed out of the Data Access Layer which prevents abstraction leak and resource leak potentials. You don't need to construct a full List<Product> in memory (which could be very big) if you just want to forward iterate once. If you only want to consume up to a certain point in the list, you won't incur the database cost of looking up the other items. This could give us an example like: 1: // a sample data access object class to do standard CRUD operations. 2: public class ProductDao 3: { 4: private DbProviderFactory _factory = SqlClientFactory.Instance 5:  6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: // must create the connection 10: using (var con = _factory.CreateConnection()) 11: { 12: con.ConnectionString = _productsConnectionString; 13: con.Open(); 14:  15: // create the command 16: using (var cmd = _factory.CreateCommand()) 17: { 18: cmd.Connection = con; 19: cmd.CommandText = _getAllProductsStoredProc; 20: cmd.CommandType = CommandType.StoredProcedure; 21:  22: // get a reader and pass back all results 23: using (var reader = cmd.ExecuteReader()) 24: { 25: while(reader.Read()) 26: { 27: yield return new Product 28: { 29: Name = reader["product_name"].ToString(), 30: ... 31: }; 32: } 33: } 34: } 35: } 36: } 37: } The database details themselves are irrelevant. I will say, though, that I'm a big fan of using the System.Data.Common classes instead of your provider specific counterparts directly (SqlCommand, OracleCommand, etc). This lets you mock your data sources easily in unit testing and also allows you to swap out your provider in one line of code. In fact, one of the shared components I'm most proud of implementing was our group's DatabaseUtility library that simplifies all the database access above into one line of code in a thread-safe and provider-neutral way. I went with my own flavor instead of the EL due to the fact I didn't want to force internal company consumers to use the EL if they didn't want to, and it made it easy to allow them to mock their database for unit testing by providing a MockCommand, MockConnection, etc that followed the System.Data.Common model. One of these days I'll blog on that if anyone's interested. Regardless, you often have situations like the above where you are consuming and iterating through a resource that must be closed once you are finished iterating. For the reasons stated above, I didn't want to return IDataReader (that would force them to remember to Dispose it), and I didn't want to return List<Product> (that would force them to hold all products in memory) -- but the first time I wrote this, I was worried. What if you never consume the last item and exit the loop? Are the reader, command, and connection all disposed correctly? Of course, I was 99.999999% sure the creators of C# had already thought of this and taken care of it, but inspection in Reflector was difficult due to the nature of the state machines yield return generates, so I decided to try a quick example program to verify whether or not Dispose() will be called when an iterator is broken from outside the iterator itself -- i.e. before the iterator reports there are no more items. So I wrote a quick Sequencer class with a Dispose() method and an iterator for it. Yes, it is COMPLETELY contrived: 1: // A disposable sequence of int -- yes this is completely contrived... 2: internal class Sequencer : IDisposable 3: { 4: private int _i = 0; 5: private readonly object _mutex = new object(); 6:  7: // Constructs an int sequence. 8: public Sequencer(int start) 9: { 10: _i = start; 11: } 12:  13: // Gets the next integer 14: public int GetNext() 15: { 16: lock (_mutex) 17: { 18: return _i++; 19: } 20: } 21:  22: // Dispose the sequence of integers. 23: public void Dispose() 24: { 25: // force output immediately (flush the buffer) 26: Console.WriteLine("Disposed with last sequence number of {0}!", _i); 27: Console.Out.Flush(); 28: } 29: } And then I created a generator (infinite-loop iterator) that did the using block for auto-Disposal: 1: // simply defines an extension method off of an int to start a sequence 2: public static class SequencerExtensions 3: { 4: // generates an infinite sequence starting at the specified number 5: public static IEnumerable<int> GetSequence(this int starter) 6: { 7: // note the using here, will call Dispose() when block terminated. 8: using (var seq = new Sequencer(starter)) 9: { 10: // infinite loop on this generator, means must be bounded by caller! 11: while(true) 12: { 13: yield return seq.GetNext(); 14: } 15: } 16: } 17: } This is really the same conundrum as the database problem originally posed. Here we are using iteration (yield return) over a large collection (infinite sequence of integers). If we cut the sequence short by breaking iteration, will that using block exit and hence, Dispose be called? Well, let's see: 1: // The test program class 2: public class IteratorTest 3: { 4: // The main test method. 5: public static void Main() 6: { 7: Console.WriteLine("Going to consume 10 of infinite items"); 8: Console.Out.Flush(); 9:  10: foreach(var i in 0.GetSequence()) 11: { 12: // could use TakeWhile, but wanted to output right at break... 13: if(i >= 10) 14: { 15: Console.WriteLine("Breaking now!"); 16: Console.Out.Flush(); 17: break; 18: } 19:  20: Console.WriteLine(i); 21: Console.Out.Flush(); 22: } 23:  24: Console.WriteLine("Done with loop."); 25: Console.Out.Flush(); 26: } 27: } So, what do we see? Do we see the "Disposed" message from our dispose, or did the Dispose get skipped because from an "eyeball" perspective we should be locked in that infinite generator loop? Here's the results: 1: Going to consume 10 of infinite items 2: 0 3: 1 4: 2 5: 3 6: 4 7: 5 8: 6 9: 7 10: 8 11: 9 12: Breaking now! 13: Disposed with last sequence number of 11! 14: Done with loop. Yes indeed, when we break the loop, the state machine that C# generates for yield iterate exits the iteration through the using blocks and auto-disposes the IDisposable correctly. I must admit, though, the first time I wrote one, I began to wonder and that led to this test. If you've never seen iterators before (I wrote a previous entry here) the infinite loop may throw you, but you have to keep in mind it is not a linear piece of code, that every time you hit a "yield return" it cedes control back to the state machine generated for the iterator. And this state machine, I'm happy to say, is smart enough to clean up the using blocks correctly. I suspected those wily guys and gals at Microsoft engineered it well, and I wasn't disappointed. But, I've been bitten by assumptions before, so it's good to test and see. Yes, maybe you knew it would or figured it would, but isn't it nice to know? And as those campy 80s G.I. Joe cartoon public service reminders always taught us, "Knowing is half the battle...". Technorati Tags: C#,.NET

    Read the article

  • C#/.NET Little Wonders: Tuples and Tuple Factory Methods

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can really help improve your code by making it easier to write and maintain.  This week, we look at the System.Tuple class and the handy factory methods for creating a Tuple by inferring the types. What is a Tuple? The System.Tuple is a class that tends to inspire a reaction in one of two ways: love or hate.  Simply put, a Tuple is a data structure that holds a specific number of items of a specific type in a specific order.  That is, a Tuple<int, string, int> is a tuple that contains exactly three items: an int, followed by a string, followed by an int.  The sequence is important not only to distinguish between two members of the tuple with the same type, but also for comparisons between tuples.  Some people tend to love tuples because they give you a quick way to combine multiple values into one result.  This can be handy for returning more than one value from a method (without using out or ref parameters), or for creating a compound key to a Dictionary, or any other purpose you can think of.  They can be especially handy when passing a series of items into a call that only takes one object parameter, such as passing an argument to a thread's startup routine.  In these cases, you do not need to define a class, simply create a tuple containing the types you wish to return, and you are ready to go? On the other hand, there are some people who see tuples as a crutch in object-oriented design.  They may view the tuple as a very watered down class with very little inherent semantic meaning.  As an example, what if you saw this in a piece of code: 1: var x = new Tuple<int, int>(2, 5); What are the contents of this tuple?  If the tuple isn't named appropriately, and if the contents of each member are not self evident from the type this can be a confusing question.  The people who tend to be against tuples would rather you explicitly code a class to contain the values, such as: 1: public sealed class RetrySettings 2: { 3: public int TimeoutSeconds { get; set; } 4: public int MaxRetries { get; set; } 5: } Here, the meaning of each int in the class is much more clear, but it's a bit more work to create the class and can clutter a solution with extra classes. So, what's the correct way to go?  That's a tough call.  You will have people who will argue quite well for one or the other.  For me, I consider the Tuple to be a tool to make it easy to collect values together easily.  There are times when I just need to combine items for a key or a result, in which case the tuple is short lived and so the meaning isn't easily lost and I feel this is a good compromise.  If the scope of the collection of items, though, is more application-wide I tend to favor creating a full class. Finally, it should be noted that tuples are immutable.  That means they are assigned a value at construction, and that value cannot be changed.  Now, of course if the tuple contains an item of a reference type, this means that the reference is immutable and not the item referred to. Tuples from 1 to N Tuples come in all sizes, you can have as few as one element in your tuple, or as many as you like.  However, since C# generics can't have an infinite generic type parameter list, any items after 7 have to be collapsed into another tuple, as we'll show shortly. So when you declare your tuple from sizes 1 (a 1-tuple or singleton) to 7 (a 7-tuple or septuple), simply include the appropriate number of type arguments: 1: // a singleton tuple of integer 2: Tuple<int> x; 3:  4: // or more 5: Tuple<int, double> y; 6:  7: // up to seven 8: Tuple<int, double, char, double, int, string, uint> z; Anything eight and above, and we have to nest tuples inside of tuples.  The last element of the 8-tuple is the generic type parameter Rest, this is special in that the Tuple checks to make sure at runtime that the type is a Tuple.  This means that a simple 8-tuple must nest a singleton tuple (one of the good uses for a singleton tuple, by the way) for the Rest property. 1: // an 8-tuple 2: Tuple<int, int, int, int, int, double, char, Tuple<string>> t8; 3:  4: // an 9-tuple 5: Tuple<int, int, int, int, double, int, char, Tuple<string, DateTime>> t9; 6:  7: // a 16-tuple 8: Tuple<int, int, int, int, int, int, int, Tuple<int, int, int, int, int, int, int, Tuple<int,int>>> t14; Notice that on the 14-tuple we had to have a nested tuple in the nested tuple.  Since the tuple can only support up to seven items, and then a rest element, that means that if the nested tuple needs more than seven items you must nest in it as well.  Constructing tuples Constructing tuples is just as straightforward as declaring them.  That said, you have two distinct ways to do it.  The first is to construct the tuple explicitly yourself: 1: var t3 = new Tuple<int, string, double>(1, "Hello", 3.1415927); This creates a triple that has an int, string, and double and assigns the values 1, "Hello", and 3.1415927 respectively.  Make sure the order of the arguments supplied matches the order of the types!  Also notice that we can't half-assign a tuple or create a default tuple.  Tuples are immutable (you can't change the values once constructed), so thus you must provide all values at construction time. Another way to easily create tuples is to do it implicitly using the System.Tuple static class's Create() factory methods.  These methods (much like C++'s std::make_pair method) will infer the types from the method call so you don't have to type them in.  This can dramatically reduce the amount of typing required especially for complex tuples! 1: // this 4-tuple is typed Tuple<int, double, string, char> 2: var t4 = Tuple.Create(42, 3.1415927, "Love", 'X'); Notice how much easier it is to use the factory methods and infer the types?  This can cut down on typing quite a bit when constructing tuples.  The Create() factory method can construct from a 1-tuple (singleton) to an 8-tuple (octuple), which of course will be a octuple where the last item is a singleton as we described before in nested tuples. Accessing tuple members Accessing a tuple's members is simplicity itself… mostly.  The properties for accessing up to the first seven items are Item1, Item2, …, Item7.  If you have an octuple or beyond, the final property is Rest which will give you the nested tuple which you can then access in a similar matter.  Once again, keep in mind that these are read-only properties and cannot be changed. 1: // for septuples and below, use the Item properties 2: var t1 = Tuple.Create(42, 3.14); 3:  4: Console.WriteLine("First item is {0} and second is {1}", 5: t1.Item1, t1.Item2); 6:  7: // for octuples and above, use Rest to retrieve nested tuple 8: var t9 = new Tuple<int, int, int, int, int, int, int, 9: Tuple<int, int>>(1,2,3,4,5,6,7,Tuple.Create(8,9)); 10:  11: Console.WriteLine("The 8th item is {0}", t9.Rest.Item1); Tuples are IStructuralComparable and IStructuralEquatable Most of you know about IComparable and IEquatable, what you may not know is that there are two sister interfaces to these that were added in .NET 4.0 to help support tuples.  These IStructuralComparable and IStructuralEquatable make it easy to compare two tuples for equality and ordering.  This is invaluable for sorting, and makes it easy to use tuples as a compound-key to a dictionary (one of my favorite uses)! Why is this so important?  Remember when we said that some folks think tuples are too generic and you should define a custom class?  This is all well and good, but if you want to design a custom class that can automatically order itself based on its members and build a hash code for itself based on its members, it is no longer a trivial task!  Thankfully the tuple does this all for you through the explicit implementations of these interfaces. For equality, two tuples are equal if all elements are equal between the two tuples, that is if t1.Item1 == t2.Item1 and t1.Item2 == t2.Item2, and so on.  For ordering, it's a little more complex in that it compares the two tuples one at a time starting at Item1, and sees which one has a smaller Item1.  If one has a smaller Item1, it is the smaller tuple.  However if both Item1 are the same, it compares Item2 and so on. For example: 1: var t1 = Tuple.Create(1, 3.14, "Hi"); 2: var t2 = Tuple.Create(1, 3.14, "Hi"); 3: var t3 = Tuple.Create(2, 2.72, "Bye"); 4:  5: // true, t1 == t2 because all items are == 6: Console.WriteLine("t1 == t2 : " + t1.Equals(t2)); 7:  8: // false, t1 != t2 because at least one item different 9: Console.WriteLine("t2 == t2 : " + t2.Equals(t3)); The actual implementation of IComparable, IEquatable, IStructuralComparable, and IStructuralEquatable is explicit, so if you want to invoke the methods defined there you'll have to manually cast to the appropriate interface: 1: // true because t1.Item1 < t3.Item1, if had been same would check Item2 and so on 2: Console.WriteLine("t1 < t3 : " + (((IComparable)t1).CompareTo(t3) < 0)); So, as I mentioned, the fact that tuples are automatically equatable and comparable (provided the types you use define equality and comparability as needed) means that we can use tuples for compound keys in hashing and ordering containers like Dictionary and SortedList: 1: var tupleDict = new Dictionary<Tuple<int, double, string>, string>(); 2:  3: tupleDict.Add(t1, "First tuple"); 4: tupleDict.Add(t2, "Second tuple"); 5: tupleDict.Add(t3, "Third tuple"); Because IEquatable defines GetHashCode(), and Tuple's IStructuralEquatable implementation creates this hash code by combining the hash codes of the members, this makes using the tuple as a complex key quite easy!  For example, let's say you are creating account charts for a financial application, and you want to cache those charts in a Dictionary based on the account number and the number of days of chart data (for example, a 1 day chart, 1 week chart, etc): 1: // the account number (string) and number of days (int) are key to get cached chart 2: var chartCache = new Dictionary<Tuple<string, int>, IChart>(); Summary The System.Tuple, like any tool, is best used where it will achieve a greater benefit.  I wouldn't advise overusing them, on objects with a large scope or it can become difficult to maintain.  However, when used properly in a well defined scope they can make your code cleaner and easier to maintain by removing the need for extraneous POCOs and custom property hashing and ordering. They are especially useful in defining compound keys to IDictionary implementations and for returning multiple values from methods, or passing multiple values to a single object parameter. Tweet Technorati Tags: C#,.NET,Tuple,Little Wonders

    Read the article

  • C# Extension Methods - To Extend or Not To Extend...

    - by James Michael Hare
    I've been thinking a lot about extension methods lately, and I must admit I both love them and hate them. They are a lot like sugar, they taste so nice and sweet, but they'll rot your teeth if you eat them too much.   I can't deny that they aren't useful and very handy. One of the major components of the Shared Component library where I work is a set of useful extension methods. But, I also can't deny that they tend to be overused and abused to willy-nilly extend every living type.   So what constitutes a good extension method? Obviously, you can write an extension method for nearly anything whether it is a good idea or not. Many times, in fact, an idea seems like a good extension method but in retrospect really doesn't fit.   So what's the litmus test? To me, an extension method should be like in the movies when a person runs into their twin, separated at birth. You just know you're related. Obviously, that's hard to quantify, so let's try to put a few rules-of-thumb around them.   A good extension method should:     Apply to any possible instance of the type it extends.     Simplify logic and improve readability/maintainability.     Apply to the most specific type or interface applicable.     Be isolated in a namespace so that it does not pollute IntelliSense.     So let's look at a few examples in relation to these rules.   The first rule, to me, is the most important of all. Once again, it bears repeating, a good extension method should apply to all possible instances of the type it extends. It should feel like the long lost relative that should have been included in the original class but somehow was missing from the family tree.    Take this nifty little int extension, I saw this once in a blog and at first I really thought it was pretty cool, but then I started noticing a code smell I couldn't quite put my finger on. So let's look:       public static class IntExtensinos     {         public static int Seconds(int num)         {             return num * 1000;         }           public static int Minutes(int num)         {             return num * 60000;         }     }     This is so you could do things like:       ...     Thread.Sleep(5.Seconds());     ...     proxy.Timeout = 1.Minutes();     ...     Awww, you say, that's cute! Well, that's the problem, it's kitschy and it doesn't always apply (and incidentally you could achieve the same thing with TimeStamp.FromSeconds(5)). It's syntactical candy that looks cool, but tends to rot and pollute the code. It would allow things like:       total += numberOfTodaysOrders.Seconds();     which makes no sense and should never be allowed. The problem is you're applying an extension method to a logical domain, not a type domain. That is, the extension method Seconds() doesn't really apply to ALL ints, it applies to ints that are representative of time that you want to convert to milliseconds.    Do you see what I mean? The two problems, in a nutshell, are that a) Seconds() called off a non-time value makes no sense and b) calling Seconds() off something to pass to something that does not take milliseconds will be off by a factor of 1000 or worse.   Thus, in my mind, you should only ever have an extension method that applies to the whole domain of that type.   For example, this is one of my personal favorites:       public static bool IsBetween<T>(this T value, T low, T high)         where T : IComparable<T>     {         return value.CompareTo(low) >= 0 && value.CompareTo(high) <= 0;     }   This allows you to check if any IComparable<T> is within an upper and lower bound. Think of how many times you type something like:       if (response.Employee.Address.YearsAt >= 2         && response.Employee.Address.YearsAt <= 10)     {     ...     }     Now, you can instead type:       if(response.Employee.Address.YearsAt.IsBetween(2, 10))     {     ...     }     Note that this applies to all IComparable<T> -- that's ints, chars, strings, DateTime, etc -- and does not depend on any logical domain. In addition, it satisfies the second point and actually makes the code more readable and maintainable.   Let's look at the third point. In it we said that an extension method should fit the most specific interface or type possible. Now, I'm not saying if you have something that applies to enumerables, you create an extension for List, Array, Dictionary, etc (though you may have reasons for doing so), but that you should beware of making things TOO general.   For example, let's say we had an extension method like this:       public static T ConvertTo<T>(this object value)     {         return (T)Convert.ChangeType(value, typeof(T));     }         This lets you do more fluent conversions like:       double d = "5.0".ConvertTo<double>();     However, if you dig into Reflector (LOVE that tool) you will see that if the type you are calling on does not implement IConvertible, what you convert to MUST be the exact type or it will throw an InvalidCastException. Now this may or may not be what you want in this situation, and I leave that up to you. Things like this would fail:       object value = new Employee();     ...     // class cast exception because typeof(IEmployee) != typeof(Employee)     IEmployee emp = value.ConvertTo<IEmployee>();       Yes, that's a downfall of working with Convertible in general, but if you wanted your fluent interface to be more type-safe so that ConvertTo were only callable on IConvertibles (and let casting be a manual task), you could easily make it:         public static T ConvertTo<T>(this IConvertible value)     {         return (T)Convert.ChangeType(value, typeof(T));     }         This is what I mean by choosing the best type to extend. Consider that if we used the previous (object) version, every time we typed a dot ('.') on an instance we'd pull up ConvertTo() whether it was applicable or not. By filtering our extension method down to only valid types (those that implement IConvertible) we greatly reduce our IntelliSense pollution and apply a good level of compile-time correctness.   Now my fourth rule is just my general rule-of-thumb. Obviously, you can make extension methods as in-your-face as you want. I included all mine in my work libraries in its own sub-namespace, something akin to:       namespace Shared.Core.Extensions { ... }     This is in a library called Shared.Core, so just referencing the Core library doesn't pollute your IntelliSense, you have to actually do a using on Shared.Core.Extensions to bring the methods in. This is very similar to the way Microsoft puts its extension methods in System.Linq. This way, if you want 'em, you use the appropriate namespace. If you don't want 'em, they won't pollute your namespace.   To really make this work, however, that namespace should only include extension methods and subordinate types those extensions themselves may use. If you plant other useful classes in those namespaces, once a user includes it, they get all the extensions too.   Also, just as a personal preference, extension methods that aren't simply syntactical shortcuts, I like to put in a static utility class and then have extension methods for syntactical candy. For instance, I think it imaginable that any object could be converted to XML:       namespace Shared.Core     {         // A collection of XML Utility classes         public static class XmlUtility         {             ...             // Serialize an object into an xml string             public static string ToXml(object input)             {                 var xs = new XmlSerializer(input.GetType());                   // use new UTF8Encoding here, not Encoding.UTF8. The later includes                 // the BOM which screws up subsequent reads, the former does not.                 using (var memoryStream = new MemoryStream())                 using (var xmlTextWriter = new XmlTextWriter(memoryStream, new UTF8Encoding()))                 {                     xs.Serialize(xmlTextWriter, input);                     return Encoding.UTF8.GetString(memoryStream.ToArray());                 }             }             ...         }     }   I also wanted to be able to call this from an object like:       value.ToXml();     But here's the problem, if i made this an extension method from the start with that one little keyword "this", it would pop into IntelliSense for all objects which could be very polluting. Instead, I put the logic into a utility class so that users have the choice of whether or not they want to use it as just a class and not pollute IntelliSense, then in my extensions namespace, I add the syntactical candy:       namespace Shared.Core.Extensions     {         public static class XmlExtensions         {             public static string ToXml(this object value)             {                 return XmlUtility.ToXml(value);             }         }     }   So now it's the best of both worlds. On one hand, they can use the utility class if they don't want to pollute IntelliSense, and on the other hand they can include the Extensions namespace and use as an extension if they want. The neat thing is it also adheres to the Single Responsibility Principle. The XmlUtility is responsible for converting objects to XML, and the XmlExtensions is responsible for extending object's interface for ToXml().

    Read the article

  • C#: Why Decorate When You Can Intercept

    - by James Michael Hare
    We've all heard of the old Decorator Design Pattern (here) or used it at one time or another either directly or indirectly.  A decorator is a class that wraps a given abstract class or interface and presents the same (or a superset) public interface but "decorated" with additional functionality.   As a really simplistic example, consider the System.IO.BufferedStream, it itself is a descendent of System.IO.Stream and wraps the given stream with buffering logic while still presenting System.IO.Stream's public interface:   1: Stream buffStream = new BufferedStream(rawStream); Now, let's take a look at a custom-code example.  Let's say that we have a class in our data access layer that retrieves a list of products from a database:  1: // a class that handles our CRUD operations for products 2: public class ProductDao 3: { 4: ... 5:  6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: var results = new List<Product>(); 10:  11: // must create the connection 12: using (var con = _factory.CreateConnection()) 13: { 14: con.ConnectionString = _productsConnectionString; 15: con.Open(); 16:  17: // create the command 18: using (var cmd = _factory.CreateCommand()) 19: { 20: cmd.Connection = con; 21: cmd.CommandText = _getAllProductsStoredProc; 22: cmd.CommandType = CommandType.StoredProcedure; 23:  24: // get a reader and pass back all results 25: using (var reader = cmd.ExecuteReader()) 26: { 27: while(reader.Read()) 28: { 29: results.Add(new Product 30: { 31: Name = reader["product_name"].ToString(), 32: ... 33: }); 34: } 35: } 36: } 37: }            38:  39: return results; 40: } 41: } Yes, you could use EF or any myriad other choices for this sort of thing, but the germaine point is that you have some operation that takes a non-trivial amount of time.  What if, during the production day I notice that my application is performing slowly and I want to see how much of that slowness is in the query versus my code.  Well, I could easily wrap the logic block in a System.Diagnostics.Stopwatch and log the results to log4net or other logging flavor of choice: 1:     // a class that handles our CRUD operations for products 2:     public class ProductDao 3:     { 4:         private static readonly ILog _log = LogManager.GetLogger(typeof(ProductDao)); 5:         ... 6:         7:         // a method that would retrieve all available products 8:         public IEnumerable<Product> GetAvailableProducts() 9:         { 10:             var results = new List<Product>(); 11:             var timer = Stopwatch.StartNew(); 12:             13:             // must create the connection 14:             using (var con = _factory.CreateConnection()) 15:             { 16:                 con.ConnectionString = _productsConnectionString; 17:                 18:                 // and all that other DB code... 19:                 ... 20:             } 21:             22:             timer.Stop(); 23:             24:             if (timer.ElapsedMilliseconds > 5000) 25:             { 26:                 _log.WarnFormat("Long query in GetAvailableProducts() took {0} ms", 27:                     timer.ElapsedMillseconds); 28:             } 29:             30:             return results; 31:         } 32:     } In my eye, this is very ugly.  It violates Single Responsibility Principle (SRP), which says that a class should only ever have one responsibility, where responsibility is often defined as a reason to change.  This class (and in particular this method) has two reasons to change: If the method of retrieving products changes. If the method of logging changes. Well, we could “simplify” this using the Decorator Design Pattern (here).  If we followed the pattern to the letter, we'd need to create a base decorator that implements the DAOs public interface and forwards to the wrapped instance.  So let's assume we break out the ProductDAO interface into IProductDAO using your refactoring tool of choice (Resharper is great for this). Now, ProductDao will implement IProductDao and get rid of all logging logic: 1:     public class ProductDao : IProductDao 2:     { 3:         // this reverts back to original version except for the interface added 4:     } 5:  And we create the base Decorator that also implements the interface and forwards all calls: 1:     public class ProductDaoDecorator : IProductDao 2:     { 3:         private readonly IProductDao _wrappedDao; 4:         5:         // constructor takes the dao to wrap 6:         public ProductDaoDecorator(IProductDao wrappedDao) 7:         { 8:             _wrappedDao = wrappedDao; 9:         } 10:         11:         ... 12:         13:         // and then all methods just forward their calls 14:         public IEnumerable<Product> GetAvailableProducts() 15:         { 16:             return _wrappedDao.GetAvailableProducts(); 17:         } 18:     } This defines our base decorator, then we can create decorators that add items of interest, and for any methods we don't decorate, we'll get the default behavior which just forwards the call to the wrapper in the base decorator: 1:     public class TimedThresholdProductDaoDecorator : ProductDaoDecorator 2:     { 3:         private static readonly ILog _log = LogManager.GetLogger(typeof(TimedThresholdProductDaoDecorator)); 4:         5:         public TimedThresholdProductDaoDecorator(IProductDao wrappedDao) : 6:             base(wrappedDao) 7:         { 8:         } 9:         10:         ... 11:         12:         public IEnumerable<Product> GetAvailableProducts() 13:         { 14:             var timer = Stopwatch.StartNew(); 15:             16:             var results = _wrapped.GetAvailableProducts(); 17:             18:             timer.Stop(); 19:             20:             if (timer.ElapsedMilliseconds > 5000) 21:             { 22:                 _log.WarnFormat("Long query in GetAvailableProducts() took {0} ms", 23:                     timer.ElapsedMillseconds); 24:             } 25:             26:             return results; 27:         } 28:     } Well, it's a bit better.  Now the logging is in its own class, and the database logic is in its own class.  But we've essentially multiplied the number of classes.  We now have 3 classes and one interface!  Now if you want to do that same logging decorating on all your DAOs, imagine the code bloat!  Sure, you can simplify and avoid creating the base decorator, or chuck it all and just inherit directly.  But regardless all of these have the problem of tying the logging logic into the code itself. Enter the Interceptors.  Things like this to me are a perfect example of when it's good to write an Interceptor using your class library of choice.  Sure, you could design your own perfectly generic decorator with delegates and all that, but personally I'm a big fan of Castle's Dynamic Proxy (here) which is actually used by many projects including Moq. What DynamicProxy allows you to do is intercept calls into any object by wrapping it with a proxy on the fly that intercepts the method and allows you to add functionality.  Essentially, the code would now look like this using DynamicProxy: 1: // Note: I like hiding DynamicProxy behind the scenes so users 2: // don't have to explicitly add reference to Castle's libraries. 3: public static class TimeThresholdInterceptor 4: { 5: // Our logging handle 6: private static readonly ILog _log = LogManager.GetLogger(typeof(TimeThresholdInterceptor)); 7:  8: // Handle to Castle's proxy generator 9: private static readonly ProxyGenerator _generator = new ProxyGenerator(); 10:  11: // generic form for those who prefer it 12: public static object Create<TInterface>(object target, TimeSpan threshold) 13: { 14: return Create(typeof(TInterface), target, threshold); 15: } 16:  17: // Form that uses type instead 18: public static object Create(Type interfaceType, object target, TimeSpan threshold) 19: { 20: return _generator.CreateInterfaceProxyWithTarget(interfaceType, target, 21: new TimedThreshold(threshold, level)); 22: } 23:  24: // The interceptor that is created to intercept the interface calls. 25: // Hidden as a private inner class so not exposing Castle libraries. 26: private class TimedThreshold : IInterceptor 27: { 28: // The threshold as a positive timespan that triggers a log message. 29: private readonly TimeSpan _threshold; 30:  31: // interceptor constructor 32: public TimedThreshold(TimeSpan threshold) 33: { 34: _threshold = threshold; 35: } 36:  37: // Intercept functor for each method invokation 38: public void Intercept(IInvocation invocation) 39: { 40: // time the method invocation 41: var timer = Stopwatch.StartNew(); 42:  43: // the Castle magic that tells the method to go ahead 44: invocation.Proceed(); 45:  46: timer.Stop(); 47:  48: // check if threshold is exceeded 49: if (timer.Elapsed > _threshold) 50: { 51: _log.WarnFormat("Long execution in {0} took {1} ms", 52: invocation.Method.Name, 53: timer.ElapsedMillseconds); 54: } 55: } 56: } 57: } Yes, it's a bit longer, but notice that: This class ONLY deals with logging long method calls, no DAO interface leftovers. This class can be used to time ANY class that has an interface or virtual methods. Personally, I like to wrap and hide the usage of DynamicProxy and IInterceptor so that anyone who uses this class doesn't need to know to add a Castle library reference.  As far as they are concerned, they're using my interceptor.  If I change to a new library if a better one comes along, they're insulated. Now, all we have to do to use this is to tell it to wrap our ProductDao and it does the rest: 1: // wraps a new ProductDao with a timing interceptor with a threshold of 5 seconds 2: IProductDao dao = TimeThresholdInterceptor.Create<IProductDao>(new ProductDao(), 5000); Automatic decoration of all methods!  You can even refine the proxy so that it only intercepts certain methods. This is ideal for so many things.  These are just some of the interceptors we've dreamed up and use: Log parameters and returns of methods to XML for auditing. Block invocations to methods and return default value (stubbing). Throw exception if certain methods are called (good for blocking access to deprecated methods). Log entrance and exit of a method and the duration. Log a message if a method takes more than a given time threshold to execute. Whether you use DynamicProxy or some other technology, I hope you see the benefits this adds.  Does it completely eliminate all need for the Decorator pattern?  No, there may still be cases where you want to decorate a particular class with functionality that doesn't apply to the world at large. But for all those cases where you are using Decorator to add functionality that's truly generic.  I strongly suggest you give this a try!

    Read the article

  • A Good Developer is So Hard to Find

    - by James Michael Hare
    Let me start out by saying I want to damn the writers of the Toughest Developer Puzzle Ever – 2. It is eating every last shred of my free time! But as I've been churning through each puzzle and marvelling at the brain teasers and trivia within, I began to think about interviewing developers and why it seems to be so hard to find good ones.  The problem is, it seems like no matter how hard we try to find the perfect way to separate the chaff from the wheat, inevitably someone will get hired who falls far short of expectations or someone will get passed over for missing a piece of trivia or a tricky brain teaser that could have been an excellent team member.   In shops that are primarily software-producing businesses or other heavily IT-oriented businesses (Microsoft, Amazon, etc) there often exists a much tighter bond between HR and the hiring development staff because development is their life-blood. Unfortunately, many of us work in places where IT is viewed as a cost or just a means to an end. In these shops, too often, HR and development staff may work against each other due to differences in opinion as to what a good developer is or what one is worth.  It seems that if you ask two different people what makes a good developer, often you will get three different opinions.   With the exception of those shops that are purely development-centric (you guys have it much easier!), most other shops have management who have very little knowledge about the development process.  Their view can often be that development is simply a skill that one learns and then once aquired, that developer can produce widgets as good as the next like workers on an assembly-line floor.  On the other side, you have many developers that feel that software development is an art unto itself and that the ability to create the most pure design or know the most obscure of keywords or write the shortest-possible obfuscated piece of code is a good coder.  So is it a skill?  An Art?  Or something entirely in between?   Saying that software is merely a skill and one just needs to learn the syntax and tools would be akin to saying anyone who knows English and can use Word can write a 300 page book that is accurate, meaningful, and stays true to the point.  This just isn't so.  It takes more than mere skill to take words and form a sentence, join those sentences into paragraphs, and those paragraphs into a document.  I've interviewed candidates who could answer obscure syntax and keyword questions and once they were hired could not code effectively at all.  So development must be more than a skill.   But on the other end, we have art.  Is development an art?  Is our end result to produce art?  I can marvel at a piece of code -- see it as concise and beautiful -- and yet that code most perform some stated function with accuracy and efficiency and maintainability.  None of these three things have anything to do with art, per se.  Art is beauty for its own sake and is a wonderful thing.  But if you apply that same though to development it just doesn't hold.  I've had developers tell me that all that matters is the end result and how you code it is entirely part of the art and I couldn't disagree more.  Yes, the end result, the accuracy, is the prime criteria to be met.  But if code is not maintainable and efficient, it would be just as useless as a beautiful car that breaks down once a week or that gets 2 miles to the gallon.  Yes, it may work in that it moves you from point A to point B and is pretty as hell, but if it can't be maintained or is not efficient, it's not a good solution.  So development must be something less than art.   In the end, I think I feel like development is a matter of craftsmanship.  We use our tools and we use our skills and set about to construct something that satisfies a purpose and yet is also elegant and efficient.  There is skill involved, and there is an art, but really it boils down to being able to craft code.  Crafting code is far more than writing code.  Anyone can write code if they know the syntax, but so few people can actually craft code that solves a purpose and craft it well.  So this is what I want to find, I want to find code craftsman!  But how?   I used to ask coding-trivia questions a long time ago and many people still fall back on this.  The thought is that if you ask the candidate some piece of coding trivia and they know the answer it must follow that they can craft good code.  For example:   What C++ keyword can be applied to a class/struct field to allow it to be changed even from a const-instance of that class/struct?  (answer: mutable)   So what do we prove if a candidate can answer this?  Only that they know what mutable means.  One would hope that this would infer that they'd know how to use it, and more importantly when and if it should ever be used!  But it rarely does!  The problem with triva questions is that you will either: Approve a really good developer who knows what some obscure keyword is (good) Reject a really good developer who never needed to use that keyword or is too inexperienced to know how to use it (bad) Approve a really bad developer who googled "C++ Interview Questions" and studied like hell but can't craft (very bad) Many HR departments love these kind of tests because they are short and easy to defend if a legal issue arrises on hiring decisions.  After all it's easy to say a person wasn't hired because they scored 30 out of 100 on some trivia test.  But unfortunately, you've eliminated a large part of your potential developer pool and possibly hired a few duds.  There are times I've hired candidates who knew every trivia question I could throw out them and couldn't craft.  And then there are times I've interviewed candidates who failed all my trivia but who I took a chance on who were my best finds ever.    So if not trivia, then what?  Brain teasers?  The thought is, these type of questions measure the thinking power of a candidate.  The problem is, once again, you will either: Approve a good candidate who has never heard the problem and can solve it (good) Reject a good candidate who just happens not to see the "catch" because they're nervous or it may be really obscure (bad) Approve a candidate who has studied enough interview brain teasers (once again, you can google em) to recognize the "catch" or knows the answer already (bad). Once again, you're eliminating good candidates and possibly accepting bad candidates.  In these cases, I think testing someone with brain teasers only tests their ability to answer brain teasers, not the ability to craft code. So how do we measure someone's ability to craft code?  Here's a novel idea: have them code!  Give them a computer and a compiler, or a whiteboard and a pen, or paper and pencil and have them construct a piece of code.  It just makes sense that if we're going to hire someone to code we should actually watch them code.  When they're done, we can judge them on several criteria: Correctness - does the candidate's solution accurately solve the problem proposed? Accuracy - is the candidate's solution reasonably syntactically correct? Efficiency - did the candidate write or use the more efficient data structures or algorithms for the job? Maintainability - was the candidate's code free of obfuscation and clever tricks that diminish readability? Persona - are they eager and willing or aloof and egotistical?  Will they work well within your team? It may sound simple, or it may sound crazy, but when I'm looking to hire a developer, I want to see them actually develop well-crafted code.

    Read the article

  • Visual Studio Little Wonders: Box Selection

    - by James Michael Hare
    So this week I decided I’d do a Little Wonder of a different kind and focus on an underused IDE improvement: Visual Studio’s Box Selection capability. This is a handy feature that many people still don’t realize was made available in Visual Studio 2010 (and beyond).  True, there have been other editors in the past with this capability, but now that it’s fully part of Visual Studio we can enjoy it’s goodness from within our own IDE. So, for those of you who don’t know what box selection is and what it allows you to do, read on! Sometimes, we want to select beyond the horizontal… The problem with traditional text selection in many editors is that it is horizontally oriented.  Sure, you can select multiple rows, but if you do you will pull in the entire row (at least for the middle rows).  Under the old selection scheme, if you wanted to select a portion of text from each row (a “box” of text) you were out of luck.  Box selection rectifies this by allowing you to select a box of text that bounded by a selection rectangle that you can grow horizontally or vertically.  So let’s think a situation that could occur where this comes in handy. Let’s say, for instance, that we are defining an enum in our code that we want to be able to translate into some string values (possibly to be stored in a database, output to screen, etc.). Perhaps such an enum would look like this: 1: public enum OrderType 2: { 3: Buy, // buy shares of a commodity 4: Sell, // sell shares of a commodity 5: Exchange, // exchange one commodity for another 6: Cancel, // cancel an order for a commodity 7: } 8:  Now, let’s say we are in the process of creating a Dictionary<K,V> to translate our OrderType: 1: var translator = new Dictionary<OrderType, string> 2: { 3: // do I really want to retype all this??? 4: }; Yes the example above is contrived so that we will pull some garbage if we do a multi-line select. I could select the lines above using the traditional multi-line selection: And then paste them into the translator code, which would result in this: 1: var translator = new Dictionary<OrderType, string> 2: { 3: Buy, // buy shares of a commodity 4: Sell, // sell shares of a commodity 5: Exchange, // exchange one commodity for another 6: Cancel, // cancel an order for a commodity 7: }; But I have a lot of junk there, sure I can manually clear it out, or use some search and replace magic, but if this were hundreds of lines instead of just a few that would quickly become cumbersome. The Box Selection Now that we have the ability to create box selections, we can select the box of text to delete!  Most of us are familiar with the fact we can drag the mouse (or hold [Shift] and use the arrow keys) to create a selection that can span multiple rows: Box selection, however, actually allows us to select a box instead of the typical horizontal lines: Then we can press the [delete] key and the pesky comments are all gone! You can do this either by holding down [Alt] while you select with your mouse, or by holding down [Alt+Shift] and using the arrow keys on the keyboard to grow the box horizontally or vertically. So now we have: 1: var translator = new Dictionary<OrderType, string> 2: { 3: Buy, 4: Sell, 5: Exchange, 6: Cancel, 7: }; Which is closer, but we still need an opening curly, the string to translate to, and the closing curly and comma. Fortunately, again, this is easy with box selections due to the fact box selection can even work for a zero-width selection! That is, hold down [Alt] and either drag down with no width, or hold down [Alt+Shift] and arrow down and you will define a selection range with no width, essentially, a vertical line selection: Notice the faint selection line on the right? So why is this useful? Well, just like with any selected range, we can type and it will replace the selection. What does this mean for box selections? It means that we can insert the same text all the way down on each line! If we have the same selection above, and type a curly and a space, we’d get: Imagine doing this over hundreds of lines and think of what a time saver it could be! Now make a zero-width selection on the other side: And type a curly and a comma, and we’d get: So close! Now finally, imagine we’ve already defined these strings somewhere and want to paste them in: 1: const private string BuyText = "Buy Shares"; 2: const private string SellText = "Sell Shares"; 3: const private string ExchangeText = "Exchange"; 4: const private string CancelText = "Cancel"; We can, again, use our box selection to pull out the constant names: And clicking copy (or [CTRL+C]) and then selecting a range to paste into: And finally clicking paste (or [CTRL+V]) to get the final result: 1: var translator = new Dictionary<OrderType, string> 2: { 3: { Buy, BuyText }, 4: { Sell, SellText }, 5: { Exchange, ExchangeText }, 6: { Cancel, CancelText }, 7: };   Sure, this was a contrived example, but I’m sure you’ll agree that it adds myriad possibilities of new ways to copy and paste vertical selections, as well as inserting text across a vertical slice. Summary: While box selection has been around in other editors, we finally get to experience it in VS2010 and beyond. It is extremely handy for selecting columns of information for cutting, copying, and pasting. In addition, it allows you to create a zero-width vertical insertion point that can be used to enter the same text across multiple rows. Imagine the time you can save adding repetitive code across multiple lines!  Try it, the more you use it, the more you’ll love it! Technorati Tags: C#,CSharp,.NET,Visual Studio,Little Wonders,Box Selection

    Read the article

  • C#/.NET Little Wonders: The Useful But Overlooked Sets

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>.  Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items.  In other words, the set {2,3,5} is identical to the set {3,5,2}.  In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2).  In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both.  Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set.  Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets.  Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation?  The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order.  Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for.  Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list.  If it is, great, we already have that market data feed!  If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it.  This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6:  7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4:  5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13:  14: // place the order logic! 15: } What’s wrong with this?  In short: performance!  Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion.  Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long.  Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection.  Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage.  So performing a sort after each add would probably add more time.  As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant.  The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface.  As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>.  That is, a Dictionary where the TKey and TValue refer to the same object.  This is really an oversimplification, but logically it makes sense.  I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups!  Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n)   Now, these times are amortized and represent the typical case.  In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains!  This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection.  Compare this to the List where almost any add or remove must rearrange potentially all the elements!  And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T>  is change our declaration.  Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4:  5: // ... 6:  7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22:  23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>.  Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order.  Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally.  Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes.  There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>.  Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n)   The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error.  There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity.  Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection.  Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection.  This is very good performance!  It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000   Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count.  This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search).  So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets.  Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets.  Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets.  Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets.  You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order.  That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it.  This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection.  Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value.  In addition, if you have need for the mathematical set operations, the C# sets support those as well.  The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order.  In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance.   Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

    Read the article

  • Visual Studio Little Wonders: Quick Launch / Quick Access

    - by James Michael Hare
    Once again, in this series of posts I look at features of Visual Studio that may seem trivial, but can help improve your efficiency as a developer. The index of all my past little wonders posts can be found here. Well, my friends, this post will be a bit short because I’m in the middle of a bit of a move at the moment.  But, that said, I didn’t want to let the blog go completely silent this week, so I decided to add another Little Wonder to the list for the Visual Studio IDE. How often have you wanted to change an option or execute a command in Visual Studio, but can’t remember where the darn thing is in the menu, settings, etc.?  If so, Quick Launch in VS2012 (or Quick Access in VS2010 with the Productivity Power Tools extension) is just for you! Quick Launch / Quick Access – find a command or option quickly For those of you using Visual Studio 2012, Quick Launch is built right into the IDE at the top of the title bar, near the minimize, maximize, and close buttons: But do not despair if you are using Visual Studio 2010, you can get Quick Access from the Productivity Power Tools extension.  To do this, you can go to the extension manager: And then go to the gallery and search for Productivity Power Tools and install it.  If you don’t have VS2012 yet, then the Productivity Power Tools is the next best thing.  This extension updates VS2010 with features such as Quick Access, the Solution Navigator, searchable Add Reference Dialog, better tab wells, etc.  I highly recommend it! But back to the topic at hand!  In VS2012 Quick Launch is built into the IDE and can be accessed by clicking in the Quick Launch area of the title bar, or by pressing CTRL+Q.  If you have VS2010 with the PPT installed, though, it is called Quick Access and is accessible through View –> Quick Access: Regardless of which IDE you are using, the feature behaves mostly the same.  It allows you to search all of Visual Studio’s commands and options for a particular topic.  For example, let’s say you want to change from tabs to tabs expanded to spaces, but don’t remember where that option is buried.  You can bring up Quick Launch / Quick Access and type in “tabs”: And it brings up a list of all options on tabs, you can then choose the one appropriate to you and click on it and it will take you right there! A lot easier than diving through the options tree to find what you are looking for!  It also works on menu commands, for example if you can’t remember how to open the Output window: It shows you the menu items that will get you to the Output window, and (if applicable) the keyboard shortcuts.  Again, clicking on one of these will perform the action for you as well. There are also some tasks you can perform directly from Quick Launch / Quick Access.  For example, perhaps you are one of those people who like to have the line numbers in your editor (I do), so let’s bring up Quick Launch / Quick Access and type “line numbers”: And let’s select Turn Line Numbers On, and now our editor looks like: And Voila!  We have line numbers in VS2010.  You can do this in VS2012 too, but it takes you to the option settings instead of directly turning them off and on.  There are bound to be differences between the way the two editors organize settings and commands, but you get the point. So, as you can see, the Quick Launch / Quick Access feature in Visual Studio makes it easy to jump right to the options, commands, or tasks you are interested in without all the digging. Summary An IDE as powerful as Visual Studio has so many options and commands that it can be confusing to remember how to find and invoke them.  Quick Launch (Quick Access in VS2010 with Productivity Power Tools extension) is a quick and handy way to jump to any of these options, commands, or tasks quickly without having to remember in what menu or screen they are buried!  Technorati Tags: C#,CSharp,.NET,Little Wonders,Visual Studio,Quick Access,Quick Launch

    Read the article

  • Premature-Optimization and Performance Anxiety

    - by James Michael Hare
    While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about.  After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations.  Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks.  In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process.  At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above.  I'm not talking about valid performance decisions.  There are decisions one should make when designing and writing an application that are valid performance decisions.  Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search).  But these in my mind are macro optimizations.  The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking.  This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it.  I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer.  I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered.  In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance.  This is why in many cases such early code-bases were very hard to maintain.  I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead.  Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications.  Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made.  But I would strongly advice against such optimization because of it's cost.  Many folks that are performing an optimization think it's always a win-win.  That they're simply adding speed to the application, what could possibly be wrong with that?  What they don't realize is the cost of their choice.  For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application.  It will become so fragile over time that maintenance will become a nightmare.  I've seen such applications in places I have worked.  There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance.  Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation).  To me this was almost a joke.  Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety.  The only time twice is really that much slower is when once was too slow to begin with.  Think about it.  2 minutes is slow as a response time because 1 minute is slow.  But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial!  The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations).  To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can.  I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious.  But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization.  I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true.  If you're a beginner, resist the urge to optimize at all costs.  And if you are an expert, delay that decision.  As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient.  Chances are it will be network, database, or disk hits that will be your slow-down, not your code.  As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact.  Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

    Read the article

  • Code Reuse is (Damn) Hard

    - by James Michael Hare
    Being a development team lead, the task of interviewing new candidates was part of my job.  Like any typical interview, we started with some easy questions to get them warmed up and help calm their nerves before hitting the hard stuff. One of those easier questions was almost always: “Name some benefits of object-oriented development.”  Nearly every time, the candidate would chime in with a plethora of canned answers which typically included: “it helps ease code reuse.”  Of course, this is a gross oversimplification.  Tools only ease reuse, its developers that ultimately can cause code to be reusable or not, regardless of the language or methodology. But it did get me thinking…  we always used to say that as part of our mantra as to why Object-Oriented Programming was so great.  With polymorphism, inheritance, encapsulation, etc. we in essence set up the concepts to help facilitate reuse as much as possible.  And yes, as a developer now of many years, I unquestionably held that belief for ages before it really struck me how my views on reuse have jaded over the years.  In fact, in many ways Agile rightly eschews reuse as taking a backseat to developing what's needed for the here and now.  It used to be I was in complete opposition to that view, but more and more I've come to see the logic in it.  Too many times I've seen developers (myself included) get lost in design paralysis trying to come up with the perfect abstraction that would stand all time.  Nearly without fail, all of these pieces of code become obsolete in a matter of months or years. It’s not that I don’t like reuse – it’s just that reuse is hard.  In fact, reuse is DAMN hard.  Many times it is just a distraction that eats up architect and developer time, and worse yet can be counter-productive and force wrong decisions.  Now don’t get me wrong, I love the idea of reusable code when it makes sense.  These are in the few cases where you are designing something that is inherently reusable.  The problem is, most business-class code is inherently unfit for reuse! Furthermore, the code that is reusable will often fail to be reused if you don’t have the proper framework in place for effective reuse that includes standardized versioning, building, releasing, and documenting the components.  That should always be standard across the board when promoting reusable code.  All of this is hard, and it should only be done when you have code that is truly reusable or you will be exerting a large amount of development effort for very little bang for your buck. But my goal here is not to get into how to reuse (that is a topic unto itself) but what should be reused.  First, let’s look at an extension method.  There’s many times where I want to kick off a thread to handle a task, then when I want to reign that thread in of course I want to do a Join on it.  But what if I only want to wait a limited amount of time and then Abort?  Well, I could of course write that logic out by hand each time, but it seemed like a great extension method: 1: public static class ThreadExtensions 2: { 3: public static bool JoinOrAbort(this Thread thread, TimeSpan timeToWait) 4: { 5: bool isJoined = false; 6:  7: if (thread != null) 8: { 9: isJoined = thread.Join(timeToWait); 10:  11: if (!isJoined) 12: { 13: thread.Abort(); 14: } 15: } 16: return isJoined; 17: } 18: } 19:  When I look at this code, I can immediately see things that jump out at me as reasons why this code is very reusable.  Some of them are standard OO principles, and some are kind-of home grown litmus tests: Single Responsibility Principle (SRP) – The only reason this extension method need change is if the Thread class itself changes (one responsibility). Stable Dependencies Principle (SDP) – This method only depends on classes that are more stable than it is (System.Threading.Thread), and in itself is very stable, hence other classes may safely depend on it. It is also not dependent on any business domain, and thus isn't subject to changes as the business itself changes. Open-Closed Principle (OCP) – This class is inherently closed to change. Small and Stable Problem Domain – This method only cares about System.Threading.Thread. All-or-None Usage – A user of a reusable class should want the functionality of that class, not parts of that functionality.  That’s not to say they most use every method, but they shouldn’t be using a method just to get half of its result. Cost of Reuse vs. Cost to Recreate – since this class is highly stable and minimally complex, we can offer it up for reuse very cheaply by promoting it as “ready-to-go” and already unit tested (important!) and available through a standard release cycle (very important!). Okay, all seems good there, now lets look at an entity and DAO.  I don’t know about you all, but there have been times I’ve been in organizations that get the grand idea that all DAOs and entities should be standardized and shared.  While this may work for small or static organizations, it’s near ludicrous for anything large or volatile. 1: namespace Shared.Entities 2: { 3: public class Account 4: { 5: public int Id { get; set; } 6:  7: public string Name { get; set; } 8:  9: public Address HomeAddress { get; set; } 10:  11: public int Age { get; set;} 12:  13: public DateTime LastUsed { get; set; } 14:  15: // etc, etc, etc... 16: } 17: } 18:  19: ... 20:  21: namespace Shared.DataAccess 22: { 23: public class AccountDao 24: { 25: public Account FindAccount(int id) 26: { 27: // dao logic to query and return account 28: } 29:  30: ... 31:  32: } 33: } Now to be fair, I’m not saying there doesn’t exist an organization where some entites may be extremely static and unchanging.  But at best such entities and DAOs will be problematic cases of reuse.  Let’s examine those same tests: Single Responsibility Principle (SRP) – The reasons to change for these classes will be strongly dependent on what the definition of the account is which can change over time and may have multiple influences depending on the number of systems an account can cover. Stable Dependencies Principle (SDP) – This method depends on the data model beneath itself which also is largely dependent on the business definition of an account which can be very inherently unstable. Open-Closed Principle (OCP) – This class is not really closed for modification.  Every time the account definition may change, you’d need to modify this class. Small and Stable Problem Domain – The definition of an account is inherently unstable and in fact may be very large.  What if you are designing a system that aggregates account information from several sources? All-or-None Usage – What if your view of the account encompasses data from 3 different sources but you only care about one of those sources or one piece of data?  Should you have to take the hit of looking up all the other data?  On the other hand, should you have ten different methods returning portions of data in chunks people tend to ask for?  Neither is really a great solution. Cost of Reuse vs. Cost to Recreate – DAOs are really trivial to rewrite, and unless your definition of an account is EXTREMELY stable, the cost to promote, support, and release a reusable account entity and DAO are usually far higher than the cost to recreate as needed. It’s no accident that my case for reuse was a utility class and my case for non-reuse was an entity/DAO.  In general, the smaller and more stable an abstraction is, the higher its level of reuse.  When I became the lead of the Shared Components Committee at my workplace, one of the original goals we looked at satisfying was to find (or create), version, release, and promote a shared library of common utility classes, frameworks, and data access objects.  Now, of course, many of you will point to nHibernate and Entity for the latter, but we were looking at larger, macro collections of data that span multiple data sources of varying types (databases, web services, etc). As we got deeper and deeper in the details of how to manage and release these items, it quickly became apparent that while the case for reuse was typically a slam dunk for utilities and frameworks, the data access objects just didn’t “smell” right.  We ended up having session after session of design meetings to try and find the right way to share these data access components. When someone asked me why it was taking so long to iron out the shared entities, my response was quite simple, “Reuse is hard...”  And that’s when I realized, that while reuse is an awesome goal and we should strive to make code maintainable, often times you end up creating far more work for yourself than necessary by trying to force code to be reusable that inherently isn’t. Think about classes the times you’ve worked in a company where in the design session people fight over the best way to implement a class to make it maximally reusable, extensible, and any other buzzwordable.  Then think about how quickly that design became obsolete.  Many times I set out to do a project and think, “yes, this is the best design, I can extend it easily!” only to find out the business requirements change COMPLETELY in such a way that the design is rendered invalid.  Code, in general, tends to rust and age over time.  As such, writing reusable code can often be difficult and many times ends up being a futile exercise and worse yet, sometimes makes the code harder to maintain because it obfuscates the design in the name of extensibility or reusability. So what do I think are reusable components? Generic Utility classes – these tend to be small classes that assist in a task and have no business context whatsoever. Implementation Abstraction Frameworks – home-grown frameworks that try to isolate changes to third party products you may be depending on (like writing a messaging abstraction layer for publishing/subscribing that is independent of whether you use JMS, MSMQ, etc). Simplification and Uniformity Frameworks – To some extent this is similar to an abstraction framework, but there may be one chosen provider but a development shop mandate to perform certain complex items in a certain way.  Or, perhaps to simplify and dumb-down a complex task for the average developer (such as implementing a particular development-shop’s method of encryption). And what are less reusable? Application and Business Layers – tend to fluctuate a lot as requirements change and new features are added, so tend to be an unstable dependency.  May be reused across applications but also very volatile. Entities and Data Access Layers – these tend to be tuned to the scope of the application, so reusing them can be hard unless the abstract is very stable. So what’s the big lesson?  Reuse is hard.  In fact it’s damn hard.  And much of the time I’m not convinced we should focus too hard on it. If you’re designing a utility or framework, then by all means design it for reuse.  But you most also really set down a good versioning, release, and documentation process to maximize your chances.  For anything else, design it to be maintainable and extendable, but don’t waste the effort on reusability for something that most likely will be obsolete in a year or two anyway.

    Read the article

  • More Fun with C# Iterators and Generators

    - by James Michael Hare
    In my last post, I talked quite a bit about iterators and how they can be really powerful tools for filtering a list of items down to a subset of items.  This had both pros and cons over returning a full collection, which, in summary, were:   Pros: If traversal is only partial, does not have to visit rest of collection. If evaluation method is costly, only incurs that cost on elements visited. Adds little to no garbage collection pressure.    Cons: Very slight performance impact if you know caller will always consume all items in collection. And as we saw in the last post, that con for the cost was very, very small and only really became evident on very tight loops consuming very large lists completely.    One of the key items to note, though, is the garbage!  In the traditional (return a new collection) method, if you have a 1,000,000 element collection, and wish to transform or filter it in some way, you have to allocate space for that copy of the collection.  That is, say you have a collection of 1,000,000 items and you want to double every item in the collection.  Well, that means you have to allocate a collection to hold those 1,000,000 items to return, which is a lot especially if you are just going to use it once and toss it.   Iterators, though, don't have this problem.  Each time you visit the node, it would return the doubled value of the node (in this example) and not allocate a second collection of 1,000,000 doubled items.  Do you see the distinction?  In both cases, we're consuming 1,000,000 items.  But in one case we pass back each doubled item which is just an int (for example's sake) on the stack and in the other case, we allocate a list containing 1,000,000 items which then must be garbage collected.   So iterators in C# are pretty cool, eh?  Well, here's one more thing a C# iterator can do that a traditional "return a new collection" transformation can't!   It can return **unbounded** collections!   I know, I know, that smells a lot like an infinite loop, eh?  Yes and no.  Basically, you're relying on the caller to put the bounds on the list, and as long as the caller doesn't you keep going.  Consider this example:   public static class Fibonacci {     // returns the infinite fibonacci sequence     public static IEnumerable<int> Sequence()     {         int iteration = 0;         int first = 1;         int second = 1;         int current = 0;         while (true)         {             if (iteration++ < 2)             {                 current = 1;             }             else             {                 current = first + second;                 second = first;                 first = current;             }             yield return current;         }     } }   Whoa, you say!  Yes, that's an infinite loop!  What the heck is going on there?  Yes, that was intentional.  Would it be better to have a fibonacci sequence that returns only a specific number of items?  Perhaps, but that wouldn't give you the power to defer the execution to the caller.   The beauty of this function is it is as infinite as the sequence itself!  The fibonacci sequence is unbounded, and so is this method.  It will continue to return fibonacci numbers for as long as you ask for them.  Now that's not something you can do with a traditional method that would return a collection of ints representing each number.  In that case you would eventually run out of memory as you got to higher and higher numbers.  This method, though, never runs out of memory.   Now, that said, you do have to know when you use it that it is an infinite collection and bound it appropriately.  Fortunately, Linq provides a lot of these extension methods for you!   Let's say you only want the first 10 fibonacci numbers:       foreach(var fib in Fibonacci.Sequence().Take(10))     {         Console.WriteLine(fib);     }   Or let's say you only want the fibonacci numbers that are less than 100:       foreach(var fib in Fibonacci.Sequence().TakeWhile(f => f < 100))     {         Console.WriteLine(fib);     }   So, you see, one of the nice things about iterators is their power to work with virtually any size (even infinite) collections without adding the garbage collection overhead of making new collections.    You can also do fun things like this to make a more "fluent" interface for for loops:   // A set of integer generator extension methods public static class IntExtensions {     // Begins counting to inifity, use To() to range this.     public static IEnumerable<int> Every(this int start)     {         // deliberately avoiding condition because keeps going         // to infinity for as long as values are pulled.         for (var i = start; ; ++i)         {             yield return i;         }     }     // Begins counting to infinity by the given step value, use To() to     public static IEnumerable<int> Every(this int start, int byEvery)     {         // deliberately avoiding condition because keeps going         // to infinity for as long as values are pulled.         for (var i = start; ; i += byEvery)         {             yield return i;         }     }     // Begins counting to inifity, use To() to range this.     public static IEnumerable<int> To(this int start, int end)     {         for (var i = start; i <= end; ++i)         {             yield return i;         }     }     // Ranges the count by specifying the upper range of the count.     public static IEnumerable<int> To(this IEnumerable<int> collection, int end)     {         return collection.TakeWhile(item => item <= end);     } }   Note that there are two versions of each method.  One that starts with an int and one that starts with an IEnumerable<int>.  This is to allow more power in chaining from either an existing collection or from an int.  This lets you do things like:   // count from 1 to 30 foreach(var i in 1.To(30)) {     Console.WriteLine(i); }     // count from 1 to 10 by 2s foreach(var i in 0.Every(2).To(10)) {     Console.WriteLine(i); }     // or, if you want an infinite sequence counting by 5s until something inside breaks you out... foreach(var i in 0.Every(5)) {     if (someCondition)     {         break;     }     ... }     Yes, those are kinda play functions and not particularly useful, but they show some of the power of generators and extension methods to form a fluid interface.   So what do you think?  What are some of your favorite generators and iterators?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >