Python and Unicode: How everything should be Unicode

Posted by A A on Stack Overflow See other posts from Stack Overflow or by A A
Published on 2010-12-27T18:15:29Z Indexed on 2010/12/27 18:53 UTC
Read the original article Hit count: 277

Filed under:
|
|

Forgive if this a long a question:

I have been programming in Python for around six months. Self taught, starting with the Python tutorial and then SO and then just using Google for stuff.

Here is the sad part: No one told me all strings should be Unicode. No, I am not lying or making this up, but where does the tutorial mention it? And most examples also I see just make use of byte strings, instead of Unicode strings. I was just browsing and came across this question on SO, which says how every string in Python should be a Unicode string. This pretty much made me cry!

I read that every string in Python 3.0 is Unicode by default, so my questions are for 2.x:

  1. Should I do a:

    print u'Some text' or just print 'Text' ?

  2. Everything should be Unicode, does this mean, like say I have a tuple:

    t = ('First', 'Second'), it should be t = (u'First', u'Second')?

    I read that I can do a from __future__ import unicode_literals and then every string will be a Unicode string, but should I do this inside a container also?

  3. When reading/ writing to a file, I should use the codecs module. Right? Or should I just use the standard way or reading/ writing and encode or decode where required?

  4. If I get the string from say raw_input(), should I convert that to Unicode also?

What is the common approach to handling all of the above issues in 2.x? The from __future__ import unicode_literals statement?

Sorry for being a such a noob, but this changes what I have been doing for a long time and so clearly I am confused.

© Stack Overflow or respective owner

Related posts about python

Related posts about unicode