Culture Sensitive GetHashCode

Posted by user114928 on Stack Overflow See other posts from Stack Overflow or by user114928
Published on 2010-06-07T09:37:27Z Indexed on 2010/06/07 9:42 UTC
Read the original article Hit count: 273

Filed under:
|

Hi,

I'm writing a c# application that will process some text and provide basic query functions. In order to ensure the best possible support for other languages, I am allowing the users of the application to specify the System.Globalization.CultureInfo (via the "en-GB" style code) and also the full range of collation options using the System.Globalization.CompareOptions flags enum.

For regular string comparison I'm then using a combination of:

a) String.Compare overload that accepts the culture and options
b) For some bulk processes I'm caching the byte data (KeyData) from CompareInfo.GetSortKey (overload that accepts the options) and using a byte-by-byte comparison of the KeyData.

This seemed fine (although please comment if you think these two methods shouldn't be mixed), but then I had reason to use the HashSet<> class which only has an overload for IEqualityComparer<>.

MS documentation seems to suggest that I should use StringComparer (which implements both IEqualityComparer<> and IComparer<>), but this only seems to support the "IgnoreCase" option from CompareOptions and not "IgnoreKanaType", "IgnoreSymbols", "IgnoreWidth" etc.

I'm assuming that a StringComparer that ignores these other options could produce different hashcodes for two strings that might be considered the same using my other comparison options. I'd therefore get incorrect results from my application.

Only thought at the moment is to create my own IEqualityComparer<> that generates a hashcode from the SortKey.KeyData and compares eqality be using the String.Compare overload.

Any suggestions?

© Stack Overflow or respective owner

Related posts about c#

Related posts about unicode