Dealing with wacky encodings in Python

Posted by Tyson on Stack Overflow See other posts from Stack Overflow or by Tyson
Published on 2010-06-07T05:42:59Z Indexed on 2010/06/07 6:22 UTC
Read the original article Hit count: 265

I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()):

u'D\\xc3\\xa9cor'
u'D\xc3\xa9cor'
'D\\xc3\\xa9cor'
'D\xc3\xa9cor'

Is there a reliable way to take any four of the above strings and return the proper unicode string?

u'D\xe9cor' # --> Décor

The only way I can think of right now uses eval(), replace(), and a deep, burning shame that will never wash away.

© Stack Overflow or respective owner

Related posts about python

Related posts about unicode