What would cause Memcached to Hang for 2+ seconds?

Posted by Brad Dwyer on Server Fault See other posts from Server Fault or by Brad Dwyer
Published on 2012-03-26T21:16:20Z Indexed on 2012/03/27 17:32 UTC
Read the original article Hit count: 250

I'm going nuts trying to scale memcached. From their site:

Memcached operations are almost all O(1). Connecting to it and issuing a get or stat command should never lag. If connecting lags, you may be hitting the max connections limit. See ServerMaint for details on stats to monitor.

If issuing commands lags, you can have a number of tuning problems. Most common are hardware problems, not enough RAM (swapping), network problems (bandwidth, dropped packets, half-duplex connections). On rare occasion OS bugs or memcached bugs can contribute.

Well.. it is most certainly not performing like an O(1) operation for me. Under low to normal load on our site memcached response times for get and set ops are about 0.001 seconds. Not bad. But if we triple the load we get outliers that take 100x (or in rare cases 1000x!) that long. I even had one instance where it took 2.2442 seconds for memcached to store a value.

Obviously this is killing our site.

Here's the output of Memcached->getStats during one of the slow periods:

        [pid] => 18079
        [uptime] => 8903
        [threads] => 4
        [time] => 1332795759
        [pointer_size] => 32
        [rusage_user_seconds] => 26
        [rusage_user_microseconds] => 503872
        [rusage_system_seconds] => 125
        [rusage_system_microseconds] => 477008
        [curr_items] => 42099
        [total_items] => 422500
        [limit_maxbytes] => 943718400
        [curr_connections] => 84
        [total_connections] => 4946
        [connection_structures] => 178
        [bytes] => 7259957
        [cmd_get] => 1679091
        [cmd_set] => 351809
        [get_hits] => 1662048
        [get_misses] => 17043
        [evictions] => 0
        [bytes_read] => 109388476
        [bytes_written] => 3187646458
        [version] => 1.4.13

So things that I have ruled out so far are:

  • Hitting the max connections limit (curr_connections of 84 is well below the default of max of 1024)
  • Swapping - the machine has 900M out of 1024M of memory dedicated to memcached on a dedicated machine. It only appears to be using about 7MB of data as per the bytes stat.

How would I diagnose the other hardware problems? prstat doesn't really show a whole lot going on in terms of CPU or memory usage. Not sure how to figure out the network problems but as this is a dedicated server on the same private network as the web box I don't think it's a connectivity issue (ping is less than a millisecond between the boxes).

Is there something else I'm missing here? It's driving me nuts.

Edit: Also forgot to mention that I've tried both persistent and non-persistent connections with minimal-to-no impact.

© Server Fault or respective owner

Related posts about networking

Related posts about php