How to diagnose, and reverse (not prevent) Unicode mangling

Posted by Steve Bennett on Stack Overflow See other posts from Stack Overflow or by Steve Bennett
Published on 2010-06-02T05:40:39Z Indexed on 2010/06/02 5:43 UTC
Read the original article Hit count: 244

Filed under:

unicode

|

strings

|

reverse-engineering

|

Corruption

Somewhere upstream of me, "something" happened that looks like unicode mangling. One symptom is that a lowercase u umlaut (ü) gets converted to "Ã¼" (ie, character FC gets converted to C3 BC). Assuming that I have no control over this upstream process, how can I reverse-engineer what's going on? And if that is possible, can I crank the sausage machine backwards and get the original text back?

(If it helps to understand this case, the text I received was in the form of a MySQL dump. I think somwewhere in the dump/transport process it got mangled.)

© Stack Overflow or respective owner

Related posts about unicode

Translating Between Unicode and Non-Unicode Character Sets in Java

as seen on Internet.com - Search for 'Internet.com'
You can use Java APIs not only to help translate characters, strings, and text streams to other languages, but also to convert Unicode character sets to non-Unicode and vice versa. >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
notepad sql Unicode and Non Unicode

as seen on Super User - Search for 'Super User'
Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How… >>> More
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

as seen on Super User - Search for 'Super User'
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer… >>> More

Related posts about strings

References to other strings in strings.xml ?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi Is it possible to reference other strings inside of strings.xml ? Something of the form: My string Here it is: android:id="@string/string_one" (If it did exist, there would of course be problems of circular, infinite definitions, etc. to beware of). Thanks. >>> More
Passing List of Strings or Array of strings into Unity Injection Constructor (Config-Based)

as seen on Stack Overflow - Search for 'Stack Overflow'
I cannot seem to get unity working when attempting to pass in an array of strings into a constructor parameter list, while using XML configuration. When I try the following: <typeConfig ...> <constructor ...> <param ... parameterType="System.String[]"> <array>… >>> More
Strings in array are no longer strings after jQuery.each()

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm pretty confused with the behaviour of arrays of strings when I loop them through the jQuery.each() method. Apparently, the strings become jQuery objects inside the callback function. However, I cannot use the this.get() method to obtain the original string; doing so triggers a this.get is not… >>> More
Complete if statement with strings and array of strings

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a page that has 3 variables. They look like this: String[] Headers = new String[] { "Max Width", "Max Length", "Max Height" }; String currentHeader = (String)HttpContext.Current.Request.QueryString["ItemHas"] ?? ""; String checkString = (String)HttpContext.Current.Request.QueryString["ItemIn"]… >>> More
[C++] Index strings by other strings

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I need to index specific strings with other strings and I can't really find a good way to do so. I tried to use tr1::unordered_map, but I'm having some difficulties using it. If someone could tell me what is the best way to do that I'd be really grateful :) I also need to index objects by a… >>> More