On ESXi, guest machines hang for significant intervals compared to real machines. How can I fix this?
- by Tarbox
This is ESXi version 5.0.0.  We plan on upgrading to 5.5 eventually.
I have four code profiles, two taken on a real, unvirtualized machine, two taken on a virtual machine.  Ordering the list of subroutines by time spent in each one, the two real profiles are practically identical.  The two virtual profiles are different from each other and from the real profiles: a subset of subroutines are taking a lot more time on the virtual machines, and the subset is different for each run.
The two virtual profiles take a similar amount of time, which is 3 times the amount of time the real profiles take.  This gross "how long does it take?" result is consistent after hundreds of tests across three different virtual machines on two different host machines -- the virtual machine is just slower.
I've only the code profiling on the four, however.  Here's the most guilty set of lines:
This is the real machine:
8µs         $text = '' unless defined $text;
1.48ms          foreach ( split( "\n", $text ) ) {
This is the first run on the virtual machine:
20.1ms      $text = '' unless defined $text;
1.49ms          foreach ( split( "\n", $text ) ) {
This is the second run on the virtual machine:
6µs         $text = '' unless defined $text;
21.9ms          foreach ( split( "\n", $text ) ) {
My WAG is that the VM is swapping out the thread and then swapping it back in, destroying some level of cache in the process, but these code profiles were taken when the vm in question was the only active vm on the host, so... what?  What does that mean?
The guest itself is under light load, this is a latency problem for my users rather than throughput.  The host is also under a light load, if I knew what resources to assign where, I could do it without worrying about the cost.
I've attempted to lock memory, reserve cpu, assign a restrictive affinity, and disable hyperthread sharing.  They don't help, it still takes the VM 2-4x the amount of time to do the same thing as the real machine.
The host the tests were run on is 6x2.50GHz, Intel Xeon E5-26400 w/ 16gigs of ram.
The guest exhibits the same performance under a wide combination of settings.
The real machine is 4x2.13GHz, Xeon E5506 w/ 2 gigs of ram.
Thank you for all advice.