case insenstive string replace that correctly works with ligatures like "ß" <=> "ss"

Posted by usr on Stack Overflow See other posts from Stack Overflow or by usr
Published on 2010-05-14T15:32:35Z Indexed on 2010/05/14 21:24 UTC
Read the original article Hit count: 151

I have build a litte asp.net form that searches for something and displays the results. I want to highlight the search string within the search results. Example:

Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum

The code that I have goes like this:

public static string HighlightSubstring(string text, string substring)
{
 var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
 if(index == -1) return HttpUtility.HtmlEncode(text);
 string p0, p1, p2;
 text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
 return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}

I mostly works but try it for example with HighlightSubstring("ß", "ss"). This crashes because in Germany "ß" and "ss" are considered to be equal by the IndexOf method, but they have different length!

Now that would be ok if there was a way to find out how long the match in "text" is. Remember that this length can be != substring.Length.

So how do I find out the length of the match that IndexOf produces in the presence of ligatures and exotic language characters (ligatures in this case)?

© Stack Overflow or respective owner

Related posts about .NET

Related posts about encoding