What's going on with my server? High load, lots of idle CPU time, low disk utilization

Posted by Jonathan on Server Fault See other posts from Server Fault or by Jonathan
Published on 2010-12-25T17:01:05Z Indexed on 2010/12/25 18:55 UTC
Read the original article Hit count: 238

Filed under:
|
|
|
|

I run a web site and send a legitimate opt-in, daily email newsletter to subscribers. Both the web hosting and email sending are done by the same machine.

I have about 100,000 subscribers who have opted in to my daily email newsletter. My PHP script did a pretty good job sending mail to all of them until fairly recently, but as the list has grown I can't keep up.

When I run top, I have very high load--usually at least 6 or 7, sometimes as high as 15--even though I only have two CPUs. However, when I run sar, my CPU is idle an average of about 30% of the time. So, it seems I'm not CPU bound. When I run iostat, it seems as though I'm not disk bound because my %util for each device is very low (no more than 5%).

Given that I don't seem to be CPU bound or disk bound, why is top reporting such high load?

Additionally, since I don't seem to be CPU bound or disk bound, why is my email sending script not able to keep up?


Here's what I see when running top:

top - 11:33:28 up 74 days, 18:49,  2 users,  load average: 7.65, 8.79, 8.28
Tasks: 168 total,   5 running, 162 sleeping,   0 stopped,   1 zombie
Cpu(s): 38.9%us, 58.6%sy,  0.8%ni,  0.0%id,  0.7%wa,  0.2%hi,  0.8%si,  0.0%st
Mem:   3083012k total,  2144436k used,   938576k free,   281136k buffers
Swap:  2048248k total,    39164k used,  2009084k free,  1470412k cached

Here's what I see when running iostat -mx:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          34.80    1.20   55.24    0.37    0.00    8.38

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.19    71.70  1.59 29.45     0.02     0.07     5.90     0.55   17.82   1.16   3.59
sda1              0.00     0.00  0.00  0.00     0.00     0.00     7.10     0.00   13.80  13.72   0.00
sda2              0.05    50.45  1.13 24.57     0.01     0.29    24.25     0.35   13.43   1.15   2.97
sda3              0.05    10.17  0.20  2.33     0.01     0.05    43.75     0.05   20.96   2.45   0.62
sda4              0.00     0.00  0.00  0.00     0.00     0.00     2.00     0.00   70.50  70.50   0.00
sda5              0.07     0.22  0.03  0.07     0.00     0.00    32.84     0.08  856.19   8.03   0.08
sda6              0.02     5.45  0.03  0.72     0.00     0.02    67.55     0.02   26.72   5.26   0.39
sda7              0.00     1.56  0.00  0.42     0.00     0.01    38.04     0.00    8.88   5.84   0.24
sda8              0.01     3.84  0.20  1.35     0.00     0.02    28.55     0.05   31.90   4.08   0.63

Here's what I see when running sar:

09:40:02 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:50:01 AM       all     30.59      1.01     49.80      0.23      0.00     18.37
10:00:08 AM       all     31.73      0.92     51.66      0.13      0.00     15.55
10:10:06 AM       all     30.43      0.99     48.94      0.26      0.00     19.38
10:20:01 AM       all     29.58      1.00     47.76      0.25      0.00     21.42
10:30:01 AM       all     29.37      1.02     47.30      0.18      0.00     22.13
10:40:06 AM       all     32.50      1.01     52.94      0.16      0.00     13.39
10:50:01 AM       all     30.49      1.00     49.59      0.15      0.00     18.77
11:00:01 AM       all     29.43      0.99     47.71      0.17      0.00     21.71
11:10:07 AM       all     30.26      0.93     49.48      0.83      0.00     18.50
11:20:02 AM       all     29.83      0.81     48.51      1.32      0.00     19.52
11:30:06 AM       all     31.18      0.88     51.33      1.15      0.00     15.47
Average:          all     26.21      1.15     42.62      0.48      0.00     29.54

Here are the top handful of processes listed at the particular time I happened to run top -c:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                      
 8180 mysql     16   0 57448  19m 2948 S 26.6  0.7   4702:26 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/bristno.pid --skip-external-locking                          
26956 brristno  17   0     0    0    0 Z  8.0  0.0   0:00.24 [php] <defunct>                                                                                                                                                               
26958 brristno  17   0 94408  43m  37m R  5.0  1.4   0:00.15 /usr/bin/php /home/brristno/public_html/dbv.php                                                                                                                               
22852 nobody    16   0  9628 2900 1524 S  0.7  0.1   0:00.17 /usr/local/apache/bin/httpd -k start -DSSL                                                                                                                                    
 8591 brristno  34  19 96896  13m 6652 S  0.3  0.4   0:29.82 /usr/local/bin/php /home/brristno/bin/mailer.php 1qwqyb6 i0gbor                                                                                                               
24469 nobody    16   0  9628 2880 1508 S  0.3  0.1   0:00.08 /usr/local/apache/bin/httpd -k start -DSSL                                                                                                                                    
25495 nobody    15   0  9628 2876 1500 S  0.3  0.1   0:00.06 /usr/local/apache/bin/httpd -k start -DSSL                                                                                                                                    
26149 nobody    15   0  9628 2864 1504 S  0.3  0.1   0:00.04 /usr/local/apache/bin/httpd -k start -DSSL      

© Server Fault or respective owner

Related posts about php

Related posts about linux