Force request to miss cache but still store the response

Posted by Tom Marthenal on Server Fault See other posts from Server Fault or by Tom Marthenal
Published on 2012-09-08T05:53:55Z Indexed on 2012/09/08 9:40 UTC
Read the original article Hit count: 202

Filed under:
|

I have a slow web app that I've placed Varnish in front of. All of the pages are static (they don't vary for a different user), but they need to be updated every 5 minutes so they contain recent data.

I have a simple script (wget --mirror) that crawls the entire website every 15 minutes. Each crawl takes about 5 minutes. The point of the crawl is to update every page in the Varnish cache so that a user never has to wait for the page to generate (since all pages have been generated recently thanks to the spider).

The timeline looks like this:

  • 00:00:00: Cache flushed
  • 00:00:00: Spider starts crawling to update cache with new pages
  • 00:05:00: Spider finishes crawling, all pages are updated until 1:15

A request that comes in between 0:00:00 and 0:05:00 might hit a page that hasn't been updated yet, and will be forced to wait a few seconds for a response. This isn't acceptable.

What I'd like to do is, perhaps using some VCL magic, always foward requests from the spider to the backend, but still store the response in the cache. This way, a user will never have to wait for a page to generate since there is no 5-minute window in which parts of the cache are empty (except perhaps at server startup).

How can I do this?

© Server Fault or respective owner

Related posts about cache

Related posts about varnish