Perl strings internals

Posted by n0rd on Stack Overflow See other posts from Stack Overflow or by n0rd
Published on 2010-06-03T08:30:14Z Indexed on 2010/06/03 8:34 UTC
Read the original article Hit count: 535

Filed under:

encoding

How does perl strings represented internally? What encoding is used? How do I handle different encodings properly?

I've been using perl for quite a long time, but it didn't include a lot of string handling in different encodings, and when I encountered a minor problem that had something to do with encodings I usually resorted to some shamanic actions.

Until this moment I thought about perl strings as sequences of bytes, which did fit pretty well for my tasks. Now I need to do some processing of UTF-8 encoded file and here starts trouble.

First, I read file into string like this:

open(my $in, '<', $ARGV[0]) or die "cannot open file $ARGV[0] for reading";
binmode($in, ':utf8');

my $contents;

{
    local $/;
    $contents = <$in>;
}

close($in);

then simply print it:

print $contents;

And I get two things: a warning Wide character in print at <scriptname> line <n> and a garbage in console. So I can conclude that perl strings have a concept of "character" that can be "wide" or not, but when printed these "wide" characters are represented in console as multiple bytes, not as single "character". (I wonder now why did all my previous experience with binary files worked quite how I expected it to work without any "character" issues).

Why then I see garbage in console? If perl stores strings as character in some known encoding, I don't think there is a big problem to find out console encoding and print text properly. (I use Windows, BTW).

If perl stores strings as multibyte sequences (e.g. using same UTF-8 encoding), why is it done this way? From my C experience handling multibyte strings is PAIN.

Developer IT

Perl strings internals - Developer IT

Perl strings internals

perl

string

encoding

Related posts about perl

Munin on Centos 6 - missing perl MODULE_COMPAT_5.8.8

Pain removing a perl rootkit

How To Avoid a Perl script calling an Another Perl Script

Perl :how to sort dates in perl

please suggest a perl book exclusively for perl programs

Related posts about string

Read array dump output and generates the correspondent XML file

Formatting a date string when the string sits inside another string

Using String+string+string vs using string.replace

vb.net string concatenation string + function output + string = string + function output and no more

Trying to convert simple midlet application to Android application but running into problems.

Categories cloud