Understanding the maximum hit-rate supported by a web-server

Posted by SNag on Pro Webmasters See other posts from Pro Webmasters or by SNag
Published on 2014-08-19T18:57:59Z Indexed on 2014/08/19 22:34 UTC
Read the original article Hit count: 277

I would like to crawl a publicly available site (and one that's legal to crawl) for a personal project. From a brief trial of the crawler, I gathered that my program hits the server with a new HTTPRequest 8 times in a second. At this rate, as per my estimate, to obtain the full set of data I need about 60 full days of crawling.

While the site is legal to crawl, I understand it can still be unethical to crawl at a rate that causes inconvenience to the regular traffic on the site. What I'd like to understand here is -- how high is 8 hits per second to the server I'm crawling? Could I possibly do 4 times that (by running 4 instances of my crawler in parallel) to bring the total effort down to just 15 days instead of 60?

How do you find the maximum hit-rate a web-server supports? What would be the theoretical (and ethical) upper-limit for the crawl-rate so as to not adversely affect the server's routine traffic?

© Pro Webmasters or respective owner

Related posts about web-crawlers

Related posts about webserver