How to stop Apache from crashing my entire server?

Posted by CyberShadow on Server Fault See other posts from Server Fault or by CyberShadow
Published on 2010-12-11T04:56:21Z Indexed on 2011/01/13 6:54 UTC
Read the original article Hit count: 319

Filed under:
|
|

I maintain a Gentoo server with a few services, including Apache. It's fairly low-end (2GB of RAM and a low-end CPU with 2 cores). My problem is that, despite my best efforts, an over-loaded Apache crashes the entire server. In fact, at this point I'm close to being convinced that Linux is a horrible operating system that isn't worth anyone's time looking for stability under load.

Things I tried:

  1. Adjusting oom_adj for the root Apache process (and thus all its children). That had close to no effect. When Apache was overloaded it would bring the system to a grind, as the system paged out everything else before it got to kill anything.
  2. Turning off swap. Didn't help, it would unload memory paged to binaries of processes and other files on /, thus causing the same effect.
  3. Putting it in a memory-limited cgroup (limited to 512 MB of RAM, 1/4th of the total). This "worked", at least in my own stress tests - except the server keeps crashing under load (basically stalling all other processes, inaccessible via SSH, etc.)
  4. Running it with idle I/O priority. This wasn't a very good idea in the end, because it just caused the system load to climb indefinitely (into the thousands) with almost no visible effect - until you tried to access an unbuffered part of the disk. This caused the task to freeze. (So much for good I/O scheduling, eh?)
  5. Limiting the number of concurrent connections to Apache. Setting the number too low caused web sites to become unresponsive due to most slots being occupied with long requests (file downloads).
  6. I tried various Apache MPMs without much success (prefork, event, itk).
  7. Switching from prefork/event+php-cgi+suphp to itk+mod_php. This improved performance, but didn't solve the actual problem.
  8. Switching I/O schedulers (cfq to deadline).

Just to stress this out: I don't care if Apache itself goes down under load, I just want the rest of my system to remain stable. Of course, having Apache recover quickly after a brief period of intensive load would be great to have, but one step at a time.

Right now I am mostly dumbfounded by how can humanity, in this day and age, design an operating system where such a seemingly simple task (don't allow one system component to crash the entire system) seems practically impossible - or at least, very hard to do.

Please don't suggest things like VMs or "BUY MORE RAM".


Some more information gathered with a friend's help: The processes hang when the cgroup oom killer is invoked. Here's the call trace:

[<ffffffff8104b94b>] ? prepare_to_wait+0x70/0x7b
[<ffffffff810a9c73>] mem_cgroup_handle_oom+0xdf/0x180
[<ffffffff810a9559>] ? memcg_oom_wake_function+0x0/0x6d
[<ffffffff810aa041>] __mem_cgroup_try_charge+0x32d/0x478
[<ffffffff810aac67>] mem_cgroup_charge_common+0x48/0x73
[<ffffffff81081c98>] ? __lru_cache_add+0x60/0x62
[<ffffffff810aadc3>] mem_cgroup_newpage_charge+0x3b/0x4a
[<ffffffff8108ec38>] handle_mm_fault+0x305/0x8cf
[<ffffffff813c6276>] ? schedule+0x6ae/0x6fb
[<ffffffff8101f568>] do_page_fault+0x214/0x22b
[<ffffffff813c7e1f>] page_fault+0x1f/0x30

At this point, the apache memory cgroup is practically deadlocked, and burning CPU in syscalls (all with the above call trace). This seems like a problem in the cgroup implementation...

© Server Fault or respective owner

Related posts about linux

Related posts about apache