In vSphere 4.1 a new network Load Based Teaming (LBT) algorithm is available on the distributed virtual switch dvPort groups. The option “Route based on physical NIC load” takes the virtual machine network I/O load into account and tries to avoid congestion by dynamically reassigning and balancing the virtual switch port to physical NIC mappings.
The three existing load-balancing policies, Port-ID, Mac-Based and IP-hash use a static mapping between virtual switch ports and the connected uplinks. The VMkernel assigns a virtual switch port during the power-on of a virtual machine, this virtual switch port gets assigned to a physical NIC based on either a round-robin- or hashing algorithm, but all algorithms do not take overall utilization of the pNIC into account. This can lead to a scenario where several virtual machines mapped to the same physical adapter saturate the physical NIC and fight for bandwidth while the other adapters are underutilized. LBT solves this by remapping the virtual switch ports to a physical NIC when congestion is detected.
After the initial virtual switch port to physical port assignment is completed, Load Based teaming checks the load on the dvUplinks at a 30 second interval and dynamically reassigns port bindings based on the current network load and the level of saturation of the dvUplinks. The VMkernel indicates the network I/O load as congested if transmit (Tx) or receive (Rx) network traffic is exceeding a 75% mean over a 30 second period. (The mean is the sum of the observations divided by the number of observations).
An interval period of 30 seconds is used to avoid MAC address flapping issues with the physical switches. Although an interval of 30 seconds is used, it is recommended to enable port fast (trunk fast) on the physical switches, all switches must be a part of the same layer 2 domain.
Load Based Teaming
1 min read
I dont think “IP-hash use(s) a static mapping between virtual switch ports and the connected uplinks” is true, traffic to different target IPs may go out over different pNICS (its src-dst right?)
Hi Tommy,
In absolute sense you are correct, but IP-hash uses a static mapping depending on the hashresult of the source-destination calculation. (src-dst)
The VMkernel does not dynamically remap the connection based on network I/O load and it can continue to suffer performance impact due to other network traffic flowing through this pNIC.
Although IP-hash creates an outbound load-balancing pattern for the virtual machine itself, it ignores the overall utilization of the pNIC. From a utilization point of view, it remains static.
Hi Frank, I enjoy your articles!
I am not sure how this threshold is calculated … “The VMkernel indicates the network I/O load as congested if transmit (Tx) or receive (Rx) network traffic is exceeding a 75% mean over a 30 second period.”
What is the sampling rate?
What counters? Throughput (KB/MB) or IOPS or an either/combo?
Does this mean that the threshold is reached when a sample is in the 75-76% percentile over the last 30 seconds or is it calculated in some other way?
A little confused,
Rob
Hi Rob,
Thanks for the compliment!
Unfortunately, the information about this new feature is very sparse.
I think it does not really matter what unit (KB/MB or IOPS) is used, it’s still 75% of a total number.
The exact calcalution is unknown to me, but the info i receives is that If transmit or receive traffic exceeds a 75% mean over a 30 second period, VMkernel will signal this link as congested. Mean equals the sum of the observations divided by the number of observations. Now how this relates to a total number is unknown to me. I already requested more information, If i receive this information and I’m allowed to post it, I certainly will.
Hi Frank,
Is this feature available with all the ESX licenses or is it from a certain level up?
Thanks!
Dan, it’s only available on distributed vswitches, I’m not sure but I think you need an enterprise plus license for dvSwitches.
Thanks for the response Frank
Rob
Hi Frank, great post as always.
I have updated my dvSwitch post, see http://www.lucd.info/2009/10/12/dvswitch-scripting-part-2-dvportgroup/.
The script now supports teaming, including load based teaming as described in your post.
Thanks a lot Frank, it looks like you read my mind because i was thinking since few days to build a setup of some vESXi to test the efficiency of this new feature. Now i know how it works 🙂 I’ll give you the results if you’re interested.
BTW, did you heard about some plans to implement that in standard vSwitches ?
Thanks again for sharing your knowledge !
From a consulting perspective, I’m both extremely happy to see this…and also extremely annoyed.
Simply put, much of my customer base (I work in the channel) does not have or need Enterprise Plus (nor can they necessarily justify affording it when we’re up against Hyper-V).
So….I’m really happy to see this and plan to work it into my best practices setup….for maybe 20% of my customers that have Enterprise Plus. 🙁
“all switches must be a part of the same layer 2 domain”
This means that you cannot connect the ESX nics to separated switches, only switches that are stacked?
Thank you!
Dan,
a Layer 2 domain effectively relates to a VLAN, if that VLAN spans switches then the domain spans those switches.
Hi Frank,
Can you please elaborate on where you found the resource that supports your sentence: ”
An interval period of 30 seconds is used to avoid MAC address flapping issues with the physical switches.”
I don’t doubt your accuracy but I just cannot find anywhere that says “30 seconds”. The closest I can find is at this blog http://routing-bits.com/2012/10/24/detecting-layer2-loops/ where the author says “When NX-OS detect a series of MAC flap events that exceeds an Cisco defined limit”. Unfortunately I cannot find what the “Cisco defined limit” is! 🙁
The reason why this came up is we were having a discussion at work about how misconfigured etherchannels in a vSwitch can cause (very) frequent mac flaps. But then so can LBT yet this is a VMware “recommended” setting. Obviously if we never hit the 75% threshold for a period of longer than 30 sec then no MAC flaps will occur and all will be good 🙂
Many thanks,
KFM
The 30 seconds interval period stems from research by our engineers. As VMware we need to make sure it works on all the supporting products. Engineering conducted research and made sure that moving flows among uplinks would cause any problems at the physical switch side. LBT is designed to avoid mac flapping as we send an RARP packet to update the physical switch when a vNIC to PNIC mapping changes. Therefor the mac address is only seen on a single port.
Changing mappings at a more frequent pace might be beneficial to the vSphere layer but might cause problems at the physical layer. 30 seconds turned out to be a suitable interval.