Had an interesting issue today, an insanely slow, brand new server….. or so I thought. First a bit of background on the client, they are fairly large, with over 400 client access devices to maintain, not including server, network equipment etc. to support this the client has 3 servers, one purchased a year and the oldest one thrown out, keeping all devices in warranty and with modern powerful equipment to keep things running, in addition to this there are two other servers that are replaced every three years, these are treated differently as these are speciality servers, and only do one task.
So, building a new server for a client, in this case a Dell R520 with Server 2008R2 running as a Hypervisor, nothing special in that, I do this exceedingly regularly so it has become more of a routine build for me. What got me with this one though was during testing I was getting insanely high ping latency, not only to the virtual machines from the network and vice versa, but also from the hypervisor to the machines and vice versa. Pings to other virtual machines on others server, on different LAN segments were all responding normally in <1MS
My first thought was there was something wrong with the virtual machines and that I had butchered something in the migration, but as they worked on other hypervisors without delays, that knocked that one on the head. Then I thought network location issue, but that does not make any sense due the fact that pinging from a hypervisor to guest does not go across a physical network, so it has to be the brand new server.
Ok, so what’s new about this server, well its got newer Processors, greater memory, faster HDD’s with larger capacity’s, basically it was more a case of what wasn’t different to the last server. Not going to go through the whole process of troubleshooting, but basically it was to do with the NIC’s, fine now what about them is it. After trial and error, and of course every techs most important tool, Google I came across the issue what is it…
THE ISSUE IS VMQ or Virtual Machine Queuing inside the Broadcom NIC drivers as shown below, disable this and the issue clears instantly
Pings and other indicators are now back down to <1ms which is what I expected to see in the first place.
Hardware effected by this was as follows
Server 2008R2 Enterprise
Broadcom Quad Port NIC
Broadcom Driver 126.96.36.199 dated 4th of September 2012, as downloaded from the DELL site on the 31st of January 2013
Going to put this on my to check list in future
UPDATE 11th February 2013:
Dennis over at Flexecom has found the same thing in this posting (http://www.flexecom.com/high-ping-latency-in-hyper-v-virtual-machines/), posted on the 10th of December, wish I had found it before so I did not have to troubleshoot this myself, none the less he has more information on how VMQ’s are MEANT to work, interestingly although it is a different manufacturer, the NIC is the same, as is the driver version although the reported release date of the driver is different, so currently the problem seems to exist with BROADCOM NIC’s and specifically using driver revision 188.8.131.52. Perhaps we could get Broadcom to turn this off by default, then if desired the server admin could turn it on.