Parsing a website

Posted by Phenom on Stack Overflow See other posts from Stack Overflow or by Phenom
Published on 2010-04-06T20:08:29Z Indexed on 2010/04/06 20:13 UTC
Read the original article Hit count: 272

I want to make a program that takes as user input a website address. The program then goes to that website, downloads it, and then parses the information inside. It outputs a new html file using the information from the website.

Specifically, what this program will do is take certain links from the website, and put the links in the output html file, and it will discard everything else.

Right now I just want to make it for websites that don't require a login, but later on I want to make it work for sites where you have to login, so it will have to be able to deal with cookies.

I'll also want to later on have the program be able to explore certain links and download information from those other sites.

What are the best programming languages or tools to do this?

© Stack Overflow or respective owner

Related posts about website

Related posts about parsing