HTML encode UTF-8 string gets mangled into latin1

Posted by Ken Mayer on Stack Overflow See other posts from Stack Overflow or by Ken Mayer
Published on 2010-04-01T02:12:52Z Indexed on 2010/04/01 2:23 UTC
Read the original article Hit count: 532

Filed under:
|
|

I'm parsing my nginx logs, and I want to discover some details from the HTTP_REFERER string, for example, the query string used to find the web site. One user typed in "México" which gets encoded in the log as "query=M%E9xico".

Passing this through Rack::Utils.parse_query('query=M%E9xico') you get a hash, {"query" => "M?xico"}

When you to stuff "M?exico" into Postgres (but not the more forgiving SQLite), it pukes because the string isn't proper UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324, unescape is packing a hex string.

How can I convert the string back to UTF-8, or can I get parse_query to return UTF-8 in the first place.

© Stack Overflow or respective owner

Related posts about ruby-on-rails

Related posts about postgresql