Linux server apache httpd processes take i/o wait to close to 100% and lock down server

Posted by user3682065 on Server Fault See other posts from Server Fault or by user3682065
Published on 2014-08-06T18:35:21Z Indexed on 2014/08/21 16:22 UTC
Read the original article Hit count: 342

Filed under:
|
|
|

For about 5 days now, and seemingly out of the blue, my linux server has started locking up from time to time.

The pattern is always the same as far as I can tell from top and iotop commands around the time it starts happening:

One or more httpd processes (usually one) hang and start using up 100% of CPU power, the %wa goes close to 100% and in the iotop I see several httpd processes with 99.99% in the IO column.

I'm also running an SVN server on this machine through apache and the one way that I've been consistently able to reproduce this is to do an SVN commit of new files or an SVN update from the repository on this server (I am the only one using this SVN repository). This will always reproduce this scenario successfully, but until very recently I had no problems at all checking in/out of SVN.

But sometimes it just happens for no detectable reason at all it seems.

So it seems like there is some issue with my Apache that leads it to have processes use up a lot of read/write upon certain triggers.

I was wondering if anyone could help me uncover that issue.

EDIT: OK now it's happening again:

This is top:

[root@server ~]# top
top - 10:56:54 up  2:59,  5 users,  load average: 171.46, 70.35, 27.01
Tasks: 328 total,   2 running, 326 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.9%us,  2.0%sy,  0.0%ni,  0.0%id, 96.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2021144k total,  1968192k used,    52952k free,     2500k buffers
Swap:  4194288k total,  2938584k used,  1255704k free,    39008k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10390 apache    20   0 2774m 936m 6200 D  2.0 47.4   1:52.27 httpd
 2149 root      20   0  927m  13m 1040 S  0.7  0.7   1:50.46 namecoind
   11 root      20   0     0    0    0 R  0.3  0.0   0:30.10 events/0
   23 root      20   0     0    0    0 S  0.3  0.0   0:17.88 kblockd/1
 2049 root      20   0  382m 4932 2880 D  0.3  0.2   0:03.67 httpd
 2144 root      20   0 1702m  69m 1164 S  0.3  3.5   5:19.68 bitcoind
 6325 root      20   0 15164 1100  656 R  0.3  0.1   0:11.09 top
10311 apache    20   0  387m 9496 7320 D  0.3  0.5   0:01.89 httpd
10313 apache    20   0  391m  10m 7364 D  0.3  0.5   0:02.40 httpd
10466 apache    20   0  399m  12m 7392 D  0.3  0.7   0:02.41 httpd
10599 apache    20   0  391m 9324 7340 D  0.3  0.5   0:00.15 httpd
10628 apache    20   0  384m 7620 4052 D  0.3  0.4   0:00.01 httpd
10633 apache    20   0  384m 7048 3504 D  0.3  0.3   0:00.01 httpd
10634 apache    20   0  384m 8012 4048 D  0.3  0.4   0:00.02 httpd
10638 apache    20   0  400m  22m 9.8m D  0.3  1.1   0:01.93 httpd
10640 apache    20   0  385m 8288 4028 D  0.3  0.4   0:00.03 httpd
10641 apache    20   0  401m  21m 6376 D  0.3  1.1   0:01.45 httpd
10759 apache    20   0  385m 8816 3480 D  0.3  0.4   0:01.45 httpd
10773 apache    20   0  384m 8044 3464 D  0.3  0.4   0:00.02 httpd

This is an iotop snapshot:

Total DISK READ: 5.93 M/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
10732 be/4 apache      3.76 K/s    0.00 B/s  0.00 % 58.48 % httpd
  876 be/3 root        0.00 B/s   52.68 K/s  0.00 % 52.98 % [jbd2/dm-1-8]
10906 be/4 root      124.17 K/s    0.00 B/s  0.00 % 23.03 % sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
 2156 be/4 root      206.94 K/s    0.00 B/s  0.00 % 21.15 % bitcoind
10904 be/4 mysql       0.00 B/s    0.00 B/s  0.00 % 18.94 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
10773 be/4 apache      7.53 K/s    0.00 B/s  0.00 % 14.77 % httpd
10641 be/4 apache     15.05 K/s    0.00 B/s  0.00 % 11.57 % httpd
10399 be/4 apache   1057.29 K/s    0.00 B/s 43.16 % 10.56 % httpd
10682 be/4 sw-cp-se  158.03 K/s    0.00 B/s  0.00 %  7.45 % sw-engine-cgi -c /usr/local/psa/admin/conf/php.ini -d auto_prepend_file=auth.php3 -u psaadm
10774 be/4 apache      3.76 K/s    0.00 B/s  0.00 %  6.53 % httpd
10624 be/4 apache      0.00 B/s    0.00 B/s  0.00 %  5.53 % httpd
10356 be/4 apache    899.26 K/s    0.00 B/s 35.52 %  4.01 % httpd
10795 be/4 apache      0.00 B/s    0.00 B/s  0.00 %  3.93 % httpd
10804 be/4 apache      7.53 K/s    0.00 B/s  0.00 %  3.08 % httpd
 4379 be/4 root        2.89 M/s    0.00 B/s 99.99 %  0.00 % namecoind
10619 be/4 apache    462.80 K/s    0.00 B/s  7.80 %  0.00 % httpd
10636 be/4 apache      3.76 K/s    0.00 B/s  0.00 %  0.00 % httpd
10716 be/4 mysql     105.35 K/s    0.00 B/s  5.92 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 1988 be/4 root       18.81 K/s    0.00 B/s  0.00 %  0.00 % spamd_full.sock

I also ran lsof -p for pid 10390 which was way up top under the top command and this is the bottom line where I can sort of see what request this was and it says CLOSE_WAIT:

httpd   10390 apache   34u  IPv6 315879       0t0     TCP default-domain.com:https->crawl-66-249-65-91.googlebot.com:42907 (CLOSE_WAIT)

I'm still not sure what exactly is causing this all to happen though?

I killed that service but %wa and load average remain high, I also stopped mysqld and other services. It really only goes down once I stop httpd altogether, and even then I can't start it without finding remaining hanging httpd processes via "netstat -tulpn", killing those or doing "killall -9 httpd" and after waiting a while for it to cycle through all those then doing /etc/init.d/httpd start

© Server Fault or respective owner

Related posts about linux

Related posts about apache-2.2