javascript-aware html parser for Python ~

Posted by znetor on Stack Overflow See other posts from Stack Overflow or by znetor
Published on 2010-12-28T00:54:54Z Indexed on 2010/12/28 1:53 UTC
Read the original article Hit count: 201

Filed under:
|
|
|
<html>
<head>
    <script type="text/javascript">
    document.write('<a href="http://www.google.com">f*** js</a>');
    document.write("f*** js!");
    </script>
</head>
<body>
    <script type="text/javascript">
    document.write('<a href="http://www.google.com">f*** js</a>');
    document.write("f*** js!");
    </script>
<div><a href="http://www.google.com">f*** js</a></div>
</body>
</html>

I want use xpath to catch all lable object in the html page above...

In [1]: import lxml.html as H

In [2]: f = open("test.html","r")

In [3]: c = f.read()

In [4]: doc = H.document_fromstring(c)

In [5]: doc.xpath('//a')
Out[5]: [<Element a at a01d17c>]

In [6]: a = doc.xpath('//a')[0]

In [7]: a.getparent()
Out[7]: <Element div at a01d41c>

I only get one don't generate by js~ but firefox xpath checker can find all lable!?

http://i.imgur.com/0hSug.png

how to do that??? thx~!

<html>
<head>
</head>
<body>
<script language="javascript">
function over(){
a.innerHTML="mouse me"
}
function out(){
a.innerHTML="<a href='http://www.google.com'>google</a>"
}
</script>
<body><li id="a"onmouseover="over()" onmouseout="out()">mouse me</li>
</body>
</html>

© Stack Overflow or respective owner

Related posts about JavaScript

Related posts about python