Python minidom and UTF-8 encoded XML with hash references

Posted by Jakob Simon-Gaarde on Stack Overflow See other posts from Stack Overflow or by Jakob Simon-Gaarde
Published on 2011-01-11T22:15:56Z Indexed on 2011/01/11 22:54 UTC
Read the original article Hit count: 432

Filed under:

I am experiencing some difficulty in my home project where I need to parse a SOAP request. The SOAP is generated with gSOAP and involves string parameters with special characters like the danish letters "æøå".

gSOAP builds SOAP requests with UTF-8 encoding by default, but instead of sending the special chatacters in raw format (ie. bytes C3A6 for the special character "æ") it sends what I think is called character hash references (ie. Ã¦).

I don't completely understand why gSOAP does it this way as I can see that it has marked the incomming payload as being UTF-8 encoded anyway (Content-Type: text/xml; charset=utf-8), but this is besides the question (I think).

Anyway I guess gSOAP probably is obeying transport rules, or what?

When I parse the request from gSOAP in python with xml.dom.minidom.parseString() I get element values as unicode objects which is fine, but the character hash references are not decoded as UTF-8 character codes. It unescapes the character hash references, but does not decode the string afterwards. In the end I have a unicode string object with UTF-8 encoding:

So if the string "æble" is contained in the XML, it comes like this in the request:

"&#195;&#166;ble"

After parsing the XML the unicode string in the DOM Text Node's data member looks like this:

u'\xc3\xa6ble'

I would expect it to look like this:

u'\xe6ble'

What am I doing wrong? Should I unescape the SOAP XML before parsing it, or is it somewhere else I should be looking for the solution, maybe gSOAP?

Thanks in advance.

Best regards Jakob Simon-Gaarde

Developer IT

Python minidom and UTF-8 encoded XML with hash references - Developer IT

Python minidom and UTF-8 encoded XML with hash references

python

hash

reference

character

minidom

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about hash

Problem with hash function: hash(1) == hash(1.0)

Hash table vs Hash list vs Hash tree?

Constructing a hash table/hash function.

Hash of unique value = unique hash?

EMERGENCY - Major Problems After Perl Module Installed via WHM

Categories cloud