In utf-8 collation, why 11- is less then 1- ?

Posted by ??? on Super User See other posts from Super User or by ???
Published on 2011-01-01T13:32:38Z Indexed on 2011/01/01 13:55 UTC
Read the original article Hit count: 271

Filed under:

unicode

|

utf8

|

collation

I found that the sort result in ASCII:

1-
11-

and in UTF-8:

11-
1-

I feel it's so counter-intuitive, and it's not dictionary order.

Isn't the character '-' (002d) is always less then [0-9] (0030-0039)? What's the general rule in UTF-8 collation?

And how to bypass it, just make - be less then [0-9] while keep other characters unchanged for UTF-8, in Linux? (So it can affects the result of ls --sort, sort, etc. )

© Super User or respective owner

Related posts about unicode

Translating Between Unicode and Non-Unicode Character Sets in Java

as seen on Internet.com - Search for 'Internet.com'
You can use Java APIs not only to help translate characters, strings, and text streams to other languages, but also to convert Unicode character sets to non-Unicode and vice versa. >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
notepad sql Unicode and Non Unicode

as seen on Super User - Search for 'Super User'
Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How… >>> More
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

as seen on Super User - Search for 'Super User'
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer… >>> More

Related posts about utf8

How can I install new locale to Ubuntu?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
$ locale -a get output like this: C C.UTF-8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 POSIX zh_CN.utf8 zh_SG.utf8 How can I… >>> More
Strange display language in gnome shell

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I logged in gnome-shell, and found that the display language is set to some strange asian language (I think) without my prompt. I tried to change the locale settings but found that the default language is English (how?) despite of that strange language. Here's a snapshot, See the strange word instead… >>> More
gVim characters unreadable at random times

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
Screenshot - Anyone know what causes it and how to fix? It only started happening today, while I've been using gVim for a couple of months now. Update: Output of locale LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US… >>> More
utf8 problem with Perl and XML::Parser

as seen on Stack Overflow - Search for 'Stack Overflow'
I encountered a problem dealing with utf8, XML and Perl. The following is the smallest piece of code and data in order to reproduce the problem. Here's an XML file that needs to be parsed: <?xml version="1.0" encoding="utf-8"?> <test> <words>???????????? ??????? ????????? ??… >>> More
Applying languages / locale selectively: is it possible?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I am a Dutch user and prefer the my local date & time format, system wide. I have no trouble speaking or understanding English and find it very useful to have the rest of my system configured in English to make my life easier when I need to Google a term, for example. Is it possible to apply… >>> More