Why might a System.String object not cache its hash code?

Posted by Dan Tao on Stack Overflow See other posts from Stack Overflow or by Dan Tao
Published on 2010-06-16T13:37:37Z Indexed on 2010/06/16 15:12 UTC
Read the original article Hit count: 277

Filed under:
|
|
|
|

A glance at the source code for string.GetHashCode using Reflector reveals the following (for mscorlib.dll version 4.0):

public override unsafe int GetHashCode()
{
    fixed (char* str = ((char*) this))
    {
        char* chPtr = str;
        int num = 0x15051505;
        int num2 = num;
        int* numPtr = (int*) chPtr;
        for (int i = this.Length; i > 0; i -= 4)
        {
            num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
            if (i <= 2)
            {
                break;
            }
            num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
            numPtr += 2;
        }
        return (num + (num2 * 0x5d588b65));
    }
}

Now, I realize that the implementation of GetHashCode is not specified and is implementation-dependent, so the question "is GetHashCode implemented in the form of X or Y?" is not really answerable. I'm just curious about a few things:

  1. If Reflector has disassembled the DLL correctly and this is the implementation of GetHashCode (in my environment), am I correct in interpreting this code to indicate that a string object, based on this particular implementation, would not cache its hash code?
  2. Assuming the answer is yes, why would this be? It seems to me that the memory cost would be minimal (one more 32-bit integer, a drop in the pond compared to the size of the string itself) whereas the savings would be significant, especially in cases where, e.g., strings are used as keys in a hashtable-based collection like a Dictionary<string, [...]>. And since the string class is immutable, it isn't like the value returned by GetHashCode will ever even change.

What could I be missing?


UPDATE: In response to Andras Zoltan's closing remark:

There's also the point made in Tim's answer(+1 there). If he's right, and I think he is, then there's no guarantee that a string is actually immutable after construction, therefore to cache the result would be wrong.

Whoa, whoa there! This is an interesting point to make (and yes it's very true), but I really doubt that this was taken into consideration in the implementation of GetHashCode. The statement "therefore to cache the result would be wrong" implies to me that the framework's attitude regarding strings is "Well, they're supposed to be immutable, but really if developers want to get sneaky they're mutable so we'll treat them as such." This is definitely not how the framework views strings. It fully relies on their immutability in so many ways (interning of string literals, assignment of all zero-length strings to string.Empty, etc.) that, basically, if you mutate a string, you're writing code whose behavior is entirely undefined and unpredictable.

I guess my point is that for the author(s) of this implementation to worry, "What if this string instance is modified between calls, even though the class as it is publicly exposed is immutable?" would be like for someone planning a casual outdoor BBQ to think to him-/herself, "What if someone brings an atomic bomb to the party?" Look, if someone brings an atom bomb, party's over.

© Stack Overflow or respective owner

Related posts about .NET

Related posts about string