sort utility on cyrillic text

Posted by Anton on Super User See other posts from Super User or by Anton
Published on 2010-03-16T14:17:48Z Indexed on 2010/03/16 14:26 UTC
Read the original article Hit count: 233

Filed under:
|

I have to sort some lines of cyrillic characters and I want to use the sort utility (on MAC OS X 10.6). The problem is that result is incorrect. I take the text into clipboard, then run pbpaste | sort This is plaintext data, and I also tried passing a file to the sort command.

My source data is

???????
?????
????
????
??????
???????
????????
?????? ? ????? ???????????????
??????????
????
??????

And after sorting I get

????
????
????
?????
??????
??????
?????? ? ????? ???????????????
???????
???????
????????
??????????

Theese lines aren’t even grouped by first letter. I tried option -d, but then I get an error

sort: string comparison failed: Illegal byte sequence sort: Set LC_ALL='C' to work around the problem. sort: The strings compared were \320\321\321\321' and\320\320\320\321\321\320’.

Exporting the variable as recommended doesn’t solve the problem. What can I do to use the sort utility for such a task? Any additional info is necessary?

© Super User or respective owner

Related posts about mac

Related posts about unix