C#/.NET Little Wonders: Static Char Methods

Posted by James Michael Hare on Geeks with Blogs See other posts from Geeks with Blogs or by James Michael Hare
Published on Thu, 04 Oct 2012 16:51:45 GMT Indexed on 2012/10/09 15:40 UTC
Read the original article Hit count: 323

Filed under:

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here.

Often times in our code we deal with the bigger classes and types in the BCL, and occasionally forgot that there are some nice methods on the primitive types as well.  Today we will discuss some of the handy static methods that exist on the char (the C# alias of System.Char) type.

The Background

I was examining a piece of code this week where I saw the following:

   1: // need to get the 5th (offset 4) character in upper case
   2: var type = symbol.Substring(4, 1).ToUpper();
   3:  
   4: // test to see if the type is P
   5: if (type == "P")
   6: {
   7:     // ... do something with P type...
   8: }

Is there really any error in this code?  No, but it still struck me wrong because it is allocating two very short-lived throw-away strings, just to store and manipulate a single char:

  1. The call to Substring() generates a new string of length 1
  2. The call to ToUpper() generates a new upper-case version of the string from Step 1.

In my mind this is similar to using ToUpper() to do a case-insensitive compare: it isn’t wrong, it’s just much heavier than it needs to be (for more info on case-insensitive compares, see #2 in 5 More Little Wonders).

One of my favorite books is the C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Sutter and Alexandrescu.  True, it’s about C++ standards, but there’s also some great general programming advice in there, including two rules I love:

        8. Don’t Optimize Prematurely
        9. Don’t Pessimize Prematurely

We all know what #8 means: don’t optimize when there is no immediate need, especially at the expense of readability and maintainability.  I firmly believe this and in the axiom: it’s easier to make correct code fast than to make fast code correct.  Optimizing code to the point that it becomes difficult to maintain often gains little and often gives you little bang for the buck.

But what about #9?  Well, for that they state:

“All other things being equal, notably code complexity and readability, certain efficient design patterns and coding idioms should just flow naturally from your fingertips and are no harder to write then the pessimized alternatives. This is not premature optimization; it is avoiding gratuitous pessimization.”

Or, if I may paraphrase: “where it doesn’t increase the code complexity and readability, prefer the more efficient option”.

The example code above was one of those times I feel where we are violating a tacit C# coding idiom: avoid creating unnecessary temporary strings.  The code creates temporary strings to hold one char, which is just unnecessary.  I think the original coder thought he had to do this because ToUpper() is an instance method on string but not on char.  What he didn’t know, however, is that ToUpper() does exist on char, it’s just a static method instead (though you could write an extension method to make it look instance-ish).

This leads me (in a long-winded way) to my Little Wonders for the day…

Static Methods of System.Char

So let’s look at some of these handy, and often overlooked, static methods on the char type:

  • IsDigit(), IsLetter(), IsLetterOrDigit(), IsPunctuation(), IsWhiteSpace()
    • Methods to tell you whether a char (or position in a string) belongs to a category of characters.
  • IsLower(), IsUpper()
    • Methods that check if a char (or position in a string) is lower or upper case
  • ToLower(), ToUpper()
    • Methods that convert a single char to the lower or upper equivalent.

For example, if you wanted to see if a string contained any lower case characters, you could do the following:

   1: if (symbol.Any(c => char.IsLower(c)))
   2: {
   3:    // ...
   4: }

Which, incidentally, we could use a method group to shorten the expression to:

   1: if (symbol.Any(char.IsLower))
   2: {
   3:     // ...
   4: }

Or, if you wanted to verify that all of the characters in a string are digits:

   1: if (symbol.All(char.IsDigit))
   2: {
   3:     // ...
   4: }

Also, for the IsXxx() methods, there are overloads that take either a char, or a string and an index, this means that these two calls are logically identical:

   1: // check given a character
   2: if (char.IsUpper(symbol[0])) { ... }
   3:  
   4: // check given a string and index
   5: if (char.IsUpper(symbol, 0)) { ... }

Obviously, if you just have a char, then you’d just use the first form.  But if you have a string you can use either form equally well.

As a side note, care should be taken when examining all the available static methods on the System.Char type, as some seem to be redundant but actually have very different purposes. 

For example, there are IsDigit() and IsNumeric() methods, which sound the same on the surface, but give you different results. IsDigit() returns true if it is a base-10 digit character (‘0’, ‘1’, … ‘9’) where IsNumeric() returns true if it’s any numeric character including the characters for ½, ¼, etc.

Summary

To come full circle back to our opening example, I would have preferred the code be written like this:

   1: // grab 5th char and take upper case version of it
   2: var type = char.ToUpper(symbol[4]);
   3:  
   4: if (type == 'P')
   5: {
   6:     // ... do something with P type...
   7: }

Not only is it just as readable (if not more so), but it performs over 3x faster on my machine:

   1,000,000 iterations of char method took: 30 ms, 0.000050 ms/item.
   1,000,000 iterations of string method took: 101 ms, 0.000101 ms/item.

It’s not only immediately faster because we don’t allocate temporary strings, but as an added bonus there less garbage to collect later as well.  To me this qualifies as a case where we are using a common C# performance idiom (don’t create unnecessary temporary strings) to make our code better.

© Geeks with Blogs or respective owner