Getting data from a webpage in a stable and efficient way

Posted by Mike Heremans on Programmers See other posts from Programmers or by Mike Heremans
Published on 2012-06-06T07:59:43Z Indexed on 2012/06/06 10:48 UTC
Read the original article Hit count: 336

Filed under:
|

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action.

So my question is simple: What then, is the best / most efficient and a generally stable way to get this data?

I should note that:

  • There are no API's
  • There is no other source where I can get the data from (no databases, feeds and such)
  • There is no access to the source files. (Data from public websites)
  • Let's say the data is normal text, displayed in a table in a html page

I'm currently using python for my project but a language independent solution/tips would be nice.

As a side question: How would you go about it when the webpage is constructed by Ajax calls?

© Programmers or respective owner

Related posts about data

Related posts about parsing