Parsing unicode XML with Python SAX on App Engine

Posted by Derek Dahmer on Stack Overflow See other posts from Stack Overflow or by Derek Dahmer
Published on 2010-04-13T18:28:13Z Indexed on 2010/04/14 1:03 UTC
Read the original article Hit count: 445

Filed under:
|
|
|

I'm using xml.sax with unicode strings of XML as input, originally entered in from a web form. On my local machine (python 2.5, using the default xmlreader expat, running through app engine), it works fine. However, the exact same code and input strings on production app engine servers fail with "not well-formed". For example, it happens with the code below:

from xml import sax
class MyHandler(sax.ContentHandler):
  pass

handler = MyHandler()
# Both of these unicode strings return 'not well-formed' 
# on app engine, but work locally
xml.parseString(u"<a>b</a>",handler) 
xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)

# Both of these work, but output unicode
xml.parseString("<a>b</a>",handler) 
xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)

resulting in the error:

  File "<string>", line 1, in <module>
  File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString
    parser.parse(inpsrc)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError
    raise exception
SAXParseException: <unknown>:1:1: not well-formed (invalid token)

Any reason why app engine's parser, which also uses python2.5 and expat, would fail when inputting unicode?

© Stack Overflow or respective owner

Related posts about sax

Related posts about google-app-engine