How do I ignore the UTF-8 Byte Order Marker in String comparisons?

Posted by Skrud on Stack Overflow See other posts from Stack Overflow or by Skrud
Published on 2010-05-26T17:07:30Z Indexed on 2010/05/26 17:11 UTC
Read the original article Hit count: 234

I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5).

Here's the relevant code snippet:

byte[] rawData = GetData();
string data = Encoding.UTF8.GetString(rawData);

Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture);

While debugging this test, the data string appears to the naked eye to contain exactly the same string as the literal. When I called data.ToCharArray(), I noticed that the first byte of the string data is the value 65279 which is the UTF-8 Byte Order Marker. What I don't understand is why Encoding.UTF8.GetString() keeps this byte around.

How do I get Encoding.UTF8.GetString() to not put the Byte Order Marker in the resulting string?

© Stack Overflow or respective owner

Related posts about c#

Related posts about unit-testing