What are the best measures to protect content from being crawled?

Posted by Moak on Stack Overflow See other posts from Stack Overflow or by Moak
Published on 2011-02-08T07:13:34Z Indexed on 2011/02/08 7:25 UTC
Read the original article Hit count: 420

Filed under:
|
|
|
|

I've been crawling a lot of websites for content recently and am surprised how no site so far was able to put up much resistance. Ideally the site I'm working on should not be able to be harvested so easily. So I was wondering what are the best methods to stop bots from harvesting your web content. Obvious solutions:

  • Robots.txt (yea right)
  • IP blacklists

What can be done to catch bot activity? What can be done to make data extraction difficult? What can be done to give them crap data?
Just looking for ideas, no right/wrong answer

© Stack Overflow or respective owner

Related posts about html

Related posts about security