How to get a html elements with python lxml

Posted by Damiano on Stack Overflow See other posts from Stack Overflow or by Damiano
Published on 2010-05-10T23:50:03Z Indexed on 2010/05/10 23:54 UTC
Read the original article Hit count: 438

Filed under:

python

|

Xml

|

lxml

Hello!

I have this html code:

<table>
 <tr>
  <td class="test"><b><a href="">aaa</a></b></td>
  <td class="test">bbb</td>
  <td class="test">ccc</td>
  <td class="test"><small>ddd</small></td>
 </tr>
 <tr>
  <td class="test"><b><a href="">eee</a></b></td>
  <td class="test">fff</td>
  <td class="test">ggg</td>
  <td class="test"><small>hhh</small></td>
 </tr>
</table>

I use this Python code to extract all <td class="test"> with lxml module.

import urllib2
import lxml.html

code   = urllib.urlopen("http://www.example.com/page.html").read()
html   = lxml.html.fromstring(code)
result = html.xpath('//td[@class="test"][position() = 1 or position() = 4]')

It works good! The result is:

<td class="test"><b><a href="">aaa</a></b></td>
<td class="test"><small>ddd</small></td>


<td class="test"><b><a href="">eee</a></b></td>
<td class="test"><small>hhh</small></td>

(so the first and the fourth column of each <tr>) Now, I have to extract:

aaa (the title of the link)

ddd (text between <small> tag)

eee (the title of the link)

hhh (text between <small> tag)

How could I extract these values?

(the problem is that I have to remove <b> tag and get the title of the anchor on the first column and remove <small> tag on the forth column)

Thank you!

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about Xml

gwt+xml- can i read through incomplete XML using the GWT XML Parser

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a requirement where a user is typing in XML in a text area, and I want to show the various nodes in a tree...But as the user is typing in the xml, it wont be a complete xml (since he is still typing in the XML)... How do I read an incomplete XML and correctly generate the tree? I understand… >>> More
Store XML,update record in XML,retrive a specific record in XML stored on BB device

as seen on Stack Overflow - Search for 'Stack Overflow'
I am writing a blackberry application where i want to store the data returned by a web service in my BB device.Earlier i was going to use SQLite for storing the data in mobile but as i googled and also did programming using SQLite and found that some BB devices dont support SQLite library and fail… >>> More
perl xml parser get xml content within xml

as seen on Stack Overflow - Search for 'Stack Overflow'
How can I use XMLParser to get the item-@url, item-@replace and item-"value inside" for the content as a string of the node where item-@cone="one"? <cstep> <item cone="one" url="http://google.com/{ccc}/cthree" replace="{ccc}"> <itemsub conesub="conesub"> … >>> More
Reading php generated XML in flash?

as seen on Stack Overflow - Search for 'Stack Overflow'
Here is part 1 of our problem (Loading a dynamically generated XML file as PHP in Flash). Now we were able to get Flash to read the XML file, but we can only see the Flash render correctly when tested(test movie) from the actual Flash program. However, when we upload our files online to preview the… >>> More
Announcing RSS feeds of Microsoft All-In-One Code Framework code samples

as seen on Geeks with Blogs - Search for 'Geeks with Blogs'
Today, we are not only announcing Sample Browser v2 CTP, but we are also excited to announce the availability of RSS feeds of All-In-One Code Framework code samples. By using these feeds, you can easily track and download the new code samples. English RSS feeds All code samples: http://support… >>> More