Guessing UTF-8 encoding

Posted by Dervin Thunk on Stack Overflow See other posts from Stack Overflow or by Dervin Thunk
Published on 2009-09-11T00:03:59Z Indexed on 2010/04/03 8:13 UTC
Read the original article Hit count: 440

Filed under:
|

I have a question that may be quite naive, but I feel the need to ask, because I don't really know what is going on. I'm on Ubuntu.

Suppose I do

echo "t" > test.txt

if I then

file test.txt

I get test.txt:ASCII text

If I then do

echo "å" > test.txt

Then I get

test.txt: UTF-8 Unicode text

How does that happen? How does file "know" the encoding, or, alternatively, how does it guess it?

Thanks.

© Stack Overflow or respective owner

Related posts about utf-8

Related posts about encoding