wget not respecting my robots.txt. Is there an interceptor?
        Posted  
        
            by 
                Jane Wilkie
            
        on Pro Webmasters
        
        See other posts from Pro Webmasters
        
            or by Jane Wilkie
        
        
        
        Published on 2011-06-29T17:55:40Z
        Indexed on 
            2011/06/30
            0:30 UTC
        
        
        Read the original article
        Hit count: 321
        
robots.txt
I have a website where I post csv files as a free service. Recently I have noticed that wget and libwww have been scraping pretty hard and I was wondering how to circumvent that even if only a little.
I have implemented a robots.txt policy. I posted it below..
User-agent: wget
Disallow: /
User-agent: libwww
Disallow: /
User-agent: *
Disallow: /  
Issuing a wget from my totally independent ubuntu box shows that wget against my server just doesn't seem to work like so....
http://myserver.com/file.csv
Anyway I don't mind people just grabbing the info, I just want to implement some sort of flood control, like a wrapper or an interceptor.
Does anyone have a thought about this or could point me in the direction of a resource. I realize that it might not even be possible. Just after some ideas.
Janie
© Pro Webmasters or respective owner