parse xml with elementtree, custom sorting

Posted by microspace on Stack Overflow See other posts from Stack Overflow or by microspace
Published on 2012-06-21T19:19:54Z Indexed on 2012/06/26 15:16 UTC
Read the original article Hit count: 254

Filed under:
|

I want to parse xml file in utf-8 and sort it by some field. Soring is made by custom alphabet (s1 from sourcecode). History of question is here: sorting of list containing utf-8 charachters. I've found how to sort xml here. Sorting work correctly, the problem is with elementtree, I must admit that it doesn't work on python3

Here is source code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#import xml.etree.ElementTree as ET   # Python 2.5
import elementtree.ElementTree as ET
s1='aáàAâÂbBcCçÇdDeéEfFgGgGhHiIîÎíiiIjJkKlLmMnNóoOöÖpPqQrRsSsStTuUûúÛüÜvVwWxXyYzZ'
s2='11111122334455666aabbccddeeeeeeffgghhiijjkklllllmmnnooppqqrrsssssttuuvvwwxxyy'
trans = str.maketrans(s1, s2)
def unikey(seq):
    return seq[0].translate(trans)
tree = ET.parse("tosort.xml")
container = tree.find("entries")
data = []
for elem in container:
    keyd = elem.findtext("k")
    data.append((keyd, elem))
print (data)
data.sort(key=unikey)
print (data)
container[:] = [item[-1] for item in data]
tree.write("sorted.xml", encoding="utf-8")

Here are instructions to import elementtree module. When I import module this way :import xml.etree.ElementTree as ET, I get a message:

Traceback (most recent call last):
File "pcs.py", line 19, in <module>
container[:] = [item[-1] for item in data]
File "/usr/lib/python3.1/xml/etree/ElementTree.py", line 210, in __setitem__
assert iselement(element)
AssertionError

When I use this method to import: import elementtree.ElementTree as ET, I get this message:

Traceback (most recent call last):
File "pcs.py", line 4, in <module>
import elementtree.ElementTree as ET
File "/usr/local/lib/python3.1/dist-packages/elementtree/ElementTree.py", line 794, in <module>
_escape = re.compile(eval(r'u"[&<>\"\u0080-\uffff]+"'))
File "<string>", line 1
u"[&<>\"\u0080-\uffff]+"
                       ^
SyntaxError: invalid syntax

I use Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10). In python2.6 elementtree work without a problem.

Content of tosort.xml:

<xdxf>
<entries>
<ar><k>zaaaa</k>definition1</ar>
<ar><k>saaaa</k>definition2</ar>
...
...
</entries>
</xdxf>

© Stack Overflow or respective owner

Related posts about python-3.x

Related posts about elementtree