libxml2dom - Developer IT

Search Results

Search found 2 results on 1 pages for 'libxml2dom'.

Page 1/1 | 1

Hello, I have the html-content in some python-variable. Is it possible to use DOM for it? As I understand, libxml2dom is the tool for this. And about question. In my html there is div with id = 'some_needed_block'. In python-script: pageData = someHandler.read() pageDOM = libxml2dom.parseString(pageData, html=1) print pageDOM -> <libxml2dom.Document object at 0x2d160d0> block = pageDOM.getElementById('some_needed_block') print block -> <libxml2dom.Node object at 0xf5d1d0> def collect_text(node): s = "" for child_node in node.childNodes: if child_node.nodeType == child_node.TEXT_NODE: s += child_node.nodeValue else: s += collect_text(child_node) return s collect_text(block) -> for child_node in node.childNodes: -> AttributeError: 'NoneType' object has no attribute 'childNodes'

Read the article

HTML parser for GAE

- by Richard

Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results. Which pure Python HTML parser have you found performs best? My priority is the ability to handle bad HTML over speed.

Developer IT