In utf-8 collation, why 11- is less then 1- ?

Posted by ??? on Super User See other posts from Super User or by ???
Published on 2011-01-01T13:32:38Z Indexed on 2011/01/01 13:55 UTC
Read the original article Hit count: 187

Filed under:
|
|

I found that the sort result in ASCII:

1-
11-

and in UTF-8:

11-
1-

I feel it's so counter-intuitive, and it's not dictionary order.

Isn't the character '-' (002d) is always less then [0-9] (0030-0039)? What's the general rule in UTF-8 collation?

And how to bypass it, just make - be less then [0-9] while keep other characters unchanged for UTF-8, in Linux? (So it can affects the result of ls --sort, sort, etc. )

© Super User or respective owner

Related posts about unicode

Related posts about utf8