Debugging IO limitation

Posted by Martin F on Server Fault See other posts from Server Fault or by Martin F
Published on 2010-05-08T18:57:08Z Indexed on 2010/05/08 19:00 UTC
Read the original article Hit count: 384

Filed under:
|
|

I have a Fedora box with some severe IO limitations which I have no idea how to debug.

The server has a Areca Technology Corp. ARC-1130 12-Port PCI-X to SATA RAID Controller with 12 7200 RPM 1.5 TB disks and a Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller.

uname -a output: 2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

The server is a file server running Nginx with the stub status module enabled, so I can see the current amount of connections. The problem present itself when I have a high number of simultaneous connections in a writing state. Usually around 350, at this very moment it's at 590 and the server is almost unusable and stuck at 230mbit/s.

If I run stop and hit 1 to see CPU core usages I have all 4 cores with around 99% io wait, if I run iotop the nginx workers are the only processes producing any read load, currently at around 25MB/s. I have each of the workers bound to their own core.

Initially I figured it was just the disks being bugged. But I've run fscheck and smartmontools checks and found no errors. I also ran an iozone test which you can see the result of here: http://www.pastie.org/951667.txt?key=fimcvljulnuqy2dcdxa

Additionally, when the amount of connections are low I have no problem getting a good speed. If I wget over the local network it easily hits 60MB/sec.

Right now I just tried putting a file in /dev/shm, then I symlinked a file from the public dir to it and used wget over the local network and only got 50KB/s.

Also, if I try to cp /dev/shm/test /root/test it quickly copies around 740MB and then slows down HEAVILY. Again with iotop reporting 99% iowait.

I'm not really sure how to go about figuring out what the problems are. It could be a natural disk limitation but then the file from /dev/shm ought to transfer so it seems there's a network limit, but that's fine when there's not many connections. Perhaps it's a TCP stack problem but I really have no idea how to check that.

Any suggestions on how to proceed with debugging would be very welcome. If additional information is required then let me know and I'll try to get it.

Thanks.

© Server Fault or respective owner

Related posts about io

Related posts about networking