Getting data from a webpage in a stable and efficient way
Posted
by
Mike Heremans
on Programmers
See other posts from Programmers
or by Mike Heremans
Published on 2012-06-06T07:59:43Z
Indexed on
2012/06/06
10:48 UTC
Read the original article
Hit count: 336
Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action.
So my question is simple: What then, is the best / most efficient and a generally stable way to get this data?
I should note that:
- There are no API's
- There is no other source where I can get the data from (no databases, feeds and such)
- There is no access to the source files. (Data from public websites)
- Let's say the data is normal text, displayed in a table in a html page
I'm currently using python for my project but a language independent solution/tips would be nice.
As a side question: How would you go about it when the webpage is constructed by Ajax calls?
© Programmers or respective owner