javascript-aware html parser for Python ~

Posted by znetor on Stack Overflow See other posts from Stack Overflow or by znetor
Published on 2010-12-28T00:54:54Z Indexed on 2010/12/28 1:53 UTC
Read the original article Hit count: 201

Filed under:

JavaScript

|

python

|

xpath

|

lxml

<html>
<head>
    <script type="text/javascript">
    document.write('<a href="http://www.google.com">f*** js</a>');
    document.write("f*** js!");
    </script>
</head>
<body>
    <script type="text/javascript">
    document.write('<a href="http://www.google.com">f*** js</a>');
    document.write("f*** js!");
    </script>
<div><a href="http://www.google.com">f*** js</a></div>
</body>
</html>

I want use xpath to catch all lable object in the html page above...

In [1]: import lxml.html as H

In [2]: f = open("test.html","r")

In [3]: c = f.read()

In [4]: doc = H.document_fromstring(c)

In [5]: doc.xpath('//a')
Out[5]: [<Element a at a01d17c>]

In [6]: a = doc.xpath('//a')[0]

In [7]: a.getparent()
Out[7]: <Element div at a01d41c>

I only get one don't generate by js~ but firefox xpath checker can find all lable!?

http://i.imgur.com/0hSug.png

how to do that??? thx~!

<html>
<head>
</head>
<body>
<script language="javascript">
function over(){
a.innerHTML="mouse me"
}
function out(){
a.innerHTML="<a href='http://www.google.com'>google</a>"
}
</script>
<body><li id="a"onmouseover="over()" onmouseout="out()">mouse me</li>
</body>
</html>

© Stack Overflow or respective owner

Related posts about JavaScript

CHAT ROOMs 7 by 6

as seen on Stack Overflow - Search for 'Stack Overflow'
I am looking for chatroom on one page with 7 loggedin users and 6+rows for say 42 users.these users will keep on adding wthnew users.Need urgent help.A PRETTY UNUSUAL Q FOR MOST OF U.What is MORE REQ new features: Usernames are unique to users currently chatting You can see a "currently chatting"… >>> More
Integrating JavaScript Unit Tests with Visual Studio

as seen on Stephen Walter - Search for 'Stephen Walter'
Modern ASP.NET web applications take full advantage of client-side JavaScript to provide better interactivity and responsiveness. If you are building an ASP.NET application in the right way, you quickly end up with lots and lots of JavaScript code. When writing server code, you should be writing… >>> More
Has Javascript developed beyond what it was originally designed to do?

as seen on Programmers - Search for 'Programmers'
I've been talking with a friend about the purpose of Javascript, when and how it should be used, etc. He quoted that: JavaScript was designed to add interactivity to HTML pages [...] JavaScript gives HTML designers a programming tool HTML authors are normally not programmers… >>> More
PHP, javascript, single quote problems with IE when passing variable from ajax post to javascript fu

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi! I have been trying to get this to work for a while, and I suspect there's an easy solution that I just can't find. My head feels like jelly and I would really appreciate any help. My main page.php makes a .post() to backend.php and fetches a list of cities which it echoes in the form of: <li… >>> More
Javascript in XSL that is loaded by Javascript

as seen on Stack Overflow - Search for 'Stack Overflow'
Is there anyway to have javascript run when a XSL sheet has been applied to an XML file by Javascript? I am using a JQuery plugin to apply the sheet to the xml but the javascript that is located inside of the XSL file will not run. I put the Javascript at the bottom of the file and it still does… >>> More

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More