latin1/unicode conversion problem with ajax request and special characters

Posted by mfn on Stack Overflow See other posts from Stack Overflow or by mfn
Published on 2010-05-05T10:29:46Z Indexed on 2010/05/05 10:38 UTC
Read the original article Hit count: 306

Filed under:
|
|
|
|

Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the server for a literal em dash with ord).

Now our application also provides some kind of preview mechanism via ajax: the text is sent to the server and a complete HTML for a preview is sent back. However, the ordinary char code 150 em dash character when sent via ajax (tested with GET and POST) mutates into something more: %E2%80%93. I see this already in the apache log.

According to various sources I found, e.g. http://www.tachyonsoft.com/uc0020.htm , this is the UTF8 byte representation of em dash and my current knowledge is that JavaScript handles everything in Unicode.

However within my app, I need everything in latin1. Simply said: just like a regular POST request would have given me that em dash as char code 150, I would need that for the translated UTF8 representation too.

That's were I'm failing, because with PHP on the server when I try to decode it with either utf8_decode(...) or iconv('UTF-8', 'iso-8859-1', ...) but in both cases I get a regular ? representing this character (and iconv also throws me a notice: Detected an illegal character in input string ).

My goal is to find an automated solution, but maybe I'm trying to be überclever in this case?

I've found other people simply doing manual replacing with a predefined input/output set; but that would always give me the feeling I could loose characters.

The observant reader will note that I'm behind on understanding the full impact/complexity with things about Unicode and conversion of chars and I definitely prefer to understand the thing as a whole then a simply manual mapping.

thanks

© Stack Overflow or respective owner

Related posts about php

Related posts about latin1