Python lxml - returns null list

Posted by Chris Finlayson on Stack Overflow See other posts from Stack Overflow or by Chris Finlayson
Published on 2014-08-18T15:55:11Z Indexed on 2014/08/18 16:22 UTC
Read the original article Hit count: 189

Filed under:

python

|

screen-scraping

|

lxml

I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list?

from lxml import html
import requests

test_url = 'SC312226'

page = ('https://www.opencompany.co.uk/company/'+test_url)

print 'Now searching URL: '+page

data = requests.get(page)
tree = html.fromstring(data.text)

print tree.xpath('//title/text()') # Get page title  
print tree.xpath('//a/@href') # Get href attribute of all links  
print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')

Unless i'm missing something, it would appear the XPATH is correct:

Chrome screenshot

I checked Chrome console, appears ok! So i'm at a loss

$x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')
[
"£432,272"
]

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about screen-scraping

PHP Screen Scraping Class

as seen on Bradino - Search for 'Bradino'
After some positive feedback I have decided to continue to develop the PHP Screen Scraping class. This post will server as the permanent home for the class. Download PHP Screen Scraping Class Updates 20009-07-30 Added setHeader() function >>> More
Screen scraping over SSL with .NET

as seen on Stack Overflow - Search for 'Stack Overflow'
What solutions exist for screen scraping a site over SSL for use with .NET? My use case is that I need to login to a partner website (https), navigate through a dynamic hierarchy, and download a zipped file of reports. I certainly could use other screen scrapers if there are no good viable options… >>> More
looking for alternative to Webzinc .NET , screen scraping, web automation library for .net

as seen on Stack Overflow - Search for 'Stack Overflow'
i came across this .net library http://www.webzinc.com/online/faq.aspx however, i was wondering if there was a free alternative out there ? >>> More
Screen-scraping of a secure page of any site on https:// with asp.net in C#

as seen on Stack Overflow - Search for 'Stack Overflow'
I've done site scraping of secure page of any site on http:// but when I am trying to scrap any site on https:// then i always scrape the login page not secure page. Please advice what should i do for scraping a secure page of any site on https://. >>> More
How different is mashup from screenscraping and consuming webservices

as seen on Stack Overflow - Search for 'Stack Overflow'
From what I understand, Mashup is aggregating data from separate sources and providing a single view. How different is mashup when compared to screenscraping or using webservices to get data from external sources? >>> More