wget recursively download from pages with lots of links

Posted by Shadow on Super User See other posts from Super User or by Shadow
Published on 2010-04-13T22:18:28Z Indexed on 2010/04/13 22:23 UTC
Read the original article Hit count: 337

Filed under:

wget

When using wget with the recursive option turned on I am getting an error message when it is trying to download a file. It thinks the link is a downloadable file when in reality it should just be following it to get to the page that actually contains the files(or more links to follow) that I want.

wget -r -l 16 --accept=jpg website.com

The error message is: .... since it should be rejected. This usually occurs when the website link it is trying to fetch ends with a sql statement. The problem however doesn't occur when using the very same wget command on that link. I want to know how exactly it is trying to fetch the pages. I guess I could always take a poke around the source although I don't know how messy the project is. I might also be missing exactly what "recursive" means in the context of wget. I thought it would run through and travel in each link getting the files with the extension I have requested.

I posted this up over at stackOverFlow but they turned me over here:) Hoping you guys can help.

Related posts about wget

Make wget not download files larger than X size

as seen on Super User - Search for 'Super User'
Okay, I give up. How do I size limit which files are downloaded, like say I don't want any files bigger than 2 MB? >>> More
How to start using Wget?

as seen on Super User - Search for 'Super User'
Please, forgive me for asking this question. Usually I would try to learn thisngs myself first before bothering others, but my situation is urgent - if I don't act now and don't download all my family pictures from this website, it will be closed in about two weeks from now and I will loose all of… >>> More
wget mirroring, subdomains and directories and cookies

as seen on Server Fault - Search for 'Server Fault'
Hi all, I have an account on a web page that is now "full" (ie I have used up all my allocated space) and I would like to make a mirror of that site. wget seems like the thing to use. The problem is that I would only like to mirror the sites the lie within this directory http://user.domain.com/room/2324343/transcript/… >>> More
How can I install things in Linux with *no yum* and *no wget*?

as seen on Super User - Search for 'Super User'
I'm a newbie to Linux (that mainly uses Windows and Mac OS X) needing some advice. I was trying to install git on a Linux machine today, and encountered some problems: Not knowing the version of the installed OS, I've opened the /proc/version file which said: Linux version 2.6.9-42.0.2.ELsmp (bhcompile@ls20-bc1-13… >>> More
Getting wget to dowload only files with specific name patterns

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to use wget to DL some files. I want to DL only files whose name that fit a certain pattern, e.g. ???.txt and not any other *.txt files. Can this be done with wget? I could only find a way to --accept/--reject files based on the extension. Thanks! >>> More

Developer IT