Random and Selective ARP blindness in VMWare ESXi 4.1

Posted by Peter Grace on Server Fault See other posts from Server Fault or by Peter Grace
Published on 2012-10-09T20:31:52Z Indexed on 2012/10/09 21:43 UTC
Read the original article Hit count: 228

Filed under:
|
|
|

We have multiple VMWare ESX servers spread out amongst our company, doing various tasks. One particular ESXi host is exhibiting very peculiar behavior. We detect it when our monitoring system (Orion) notifies us that it can no longer ping the box.

Upon jumping on the local console of the guest in question, we see that it cannot ping any new addresses that aren't already in its ARP table.

At first we thought that the problem was just related to one of our guests, as the problem seemed to always happen to another guest, DevRedis. However, this afternoon the problem swapped and started happening on ApacheBox rather than DevRedis.

When I have been fortunate to catch the problem, I have run tcpdump on both sides of the connection (one side being vmware, the other side being a physical webserver) and have noticed the following course of events:

  1. Guest ApacheBox sends an ARP request for the physical address of server WindowsBeast
  2. WindowsBeast tenders an ARP is-at back to the network indicating its physical mac address.
  3. ApacheBox never sees the ARP is-at response.

The ESX host in question is running VMware ESXi, 4.1.0, 348481

The two guests (DevRedis and ApacheBox) are both running CentOS 6.3, however they are running two separate kernel versions ( 2.6.32-279.9.1.el6.x86_64 and 2.6.32-279.el6.x86_64 ) so I'm not entirely sure it's a CentOS problem.

Does anyone have any thoughts on what might cause this? Has anyone run into it before?

© Server Fault or respective owner

Related posts about networking

Related posts about centos