frankdenneman.nl - Page 81 of 89 -

Load Based Teaming

July 21, 2010 by frankdenneman

In vSphere 4.1 a new network Load Based Teaming (LBT) algorithm is available on the distributed virtual switch dvPort groups. The option “Route based on physical NIC load” takes the virtual machine network I/O load into account and tries to avoid congestion by dynamically reassigning and balancing the virtual switch port to physical NIC mappings.
The three existing load-balancing policies, Port-ID, Mac-Based and IP-hash use a static mapping between virtual switch ports and the connected uplinks. The VMkernel assigns a virtual switch port during the power-on of a virtual machine, this virtual switch port gets assigned to a physical NIC based on either a round-robin- or hashing algorithm, but all algorithms do not take overall utilization of the pNIC into account. This can lead to a scenario where several virtual machines mapped to the same physical adapter saturate the physical NIC and fight for bandwidth while the other adapters are underutilized. LBT solves this by remapping the virtual switch ports to a physical NIC when congestion is detected.
After the initial virtual switch port to physical port assignment is completed, Load Based teaming checks the load on the dvUplinks at a 30 second interval and dynamically reassigns port bindings based on the current network load and the level of saturation of the dvUplinks. The VMkernel indicates the network I/O load as congested if transmit (Tx) or receive (Rx) network traffic is exceeding a 75% mean over a 30 second period. (The mean is the sum of the observations divided by the number of observations).
An interval period of 30 seconds is used to avoid MAC address flapping issues with the physical switches. Although an interval of 30 seconds is used, it is recommended to enable port fast (trunk fast) on the physical switches, all switches must be a part of the same layer 2 domain.

DPM scheduled tasks

July 20, 2010 by frankdenneman

vSphere 4.1 introduces a lot of new features and enhancements of the existing features, one of the hidden gems I believe is the DPM enable and disable scheduled tasks. The DPM “Change cluster power settings” schedule task allows the administrator to enable or disable DPM via an automated task. If the admin selects the option DPM off, vCenter will disable all DPM features on the selected cluster and all hosts in standby mode will be powered on automatically when the scheduled task runs.
This option removes one of the biggest obstacles of implementing DPM. One of the main concerns administrators have, is the incurred (periodic) latency when enabling DPM. If DPM place an ESX host in standby mode, it can take up to five minutes before DPM decides to power up the ESX host again. During this (short) period of time, the environment experiences latency or performance loss, usually this latency occurs in the morning.
It’s common for DPM to place ESX hosts in standby mode during the night due to the decreased workloads, when the employees arrive in the morning the workload increases and DPM needs to power on additional ESX hosts. The period between 7:30 and 10:00 is recognized as one of the busiest periods of the day and during that period the IT department wants their computing power lock, stock and ready to go.
This scheduled task will give the administrators the ability to disable DPM before the workforce arrive. Because the ESX hosts remain powered-on until the administrator or a DPM scheduled task enables DPM again, another schedule can be created to enable DPM after the periods of high workload demand ends. To create a scheduled task to disable DPM, open vCenter, go to Home>Management>Scheduled Tasks (CTRL-Shift-T) and select the task “Change cluster power settings”.

Select the default power management for the cluster , On or Off and configure the task.

For example, by scheduling a DPM disable task on every weekday at 7:00, the administrator is ensured that all ESX hosts are powered on before 8 o’clock every weekday in advance of the morning peak, rather than have to wait for DPM to react to the workload increase.
By scheduling the DPM disable task more than one hour in advance of the morning peak, DRS will have the time to rebalance the virtual machine across all active hosts inside the cluster and Transparent Page Sharing process can collapse the memory pages shared by the virtual machines on the ESX hosts. By powering up all ESX hosts early, the ESX cluster will be ready to accommodate load increases.

VM to Hosts affinity rule

July 16, 2010 by frankdenneman

VMware vSphere 4.1 introduces a new affinity rule, called “Virtual Machines to Hosts” (VM-Host). This new rule is available in vSphere 4.1 DRS clusters in addition to the existing (anti) affinity rule, which is now called VM-VM affinity rule. The new VM-Host affinity rule provides the ability of placing a group of virtual machines on a subset of hosts inside the cluster. The new rule can very useful in blade system environments and for honoring ISV license requirements. Rules can be created to ensure that virtual machines run on ESX hosts in different blade chassis for availability reasons, or the complete opposite and limit the virtual machines to ESX hosts inside a blade chassis to optimize network speeds by keeping network traffic inside the blade chassis. VM-host are also very useful to fulfill the requirements of special ISV license models as well, for example restricting Oracle database virtual machines to run only on ESX hosts which are licensed by Oracle.
Difference between VM-Host affinity rules and VM-VM rules
The VM-host affinity rule differ from the VM-VM rule, A VM-Host (anti) affinity rule specify the (anti) affinity between a group of virtual machines and a group of ESX hosts inside the cluster, whether a VM-VM (anti) affinity rule only specify the (anti) affinity between individual virtual machines.
Components.
A virtual machine to host affinity rule exists out of three components:
• Virtual machine DRS group
• ESX host DRS group
• Designation – “Must” affinity\anti-affinity or “Should” affinity\anti-affinity
Virtual machine DRS groups and ESX host DRS Group are quite self-explanatory so let’s dive into the designations component straight away.
Designations
Two different types of VM-Host rules are available, a VM-Host affinity rule can either be a “must” rule or a “should” rule. The must-rule is a mandatory rule for HA, DRS and DPM, it confines or prevent the virtual machines to run on the ESX hosts specified in the ESX host DRS Group.
The “should” rule is a preferential rule for DRS and DPM and expresses a preference. DRS and DPM use their best effort to try to confine or prevent the virtual machines from running on the ESX host they are affined to, but DRS and DPM can violate “should” rules if it compromises certain key operations, HA is not aware of preferential rules because DRS will not communicate these rules to HA.
HA, DRS and DPM must take the mandatory rules into account when generating or executing operations. HA, DRS and DPM will never take any action that result in the violation of mandatory affinity rules. Because of this, mandatory rules place more constraints on VM mobility, making it more difficult for DRS to balance load and enforce resource allocation policies, HA and DPM operations are constrained as well, for example, mandatory rules will;
• Limit DRS in selecting hosts to load-balance the cluster
• Limit HA in selecting hosts to power up the virtual machines
• Limit DPM in selecting hosts to power down
Due its limiting behavior, it is recommended to use mandatory rules sparingly and only for specific cases, such as licensing requirements. Preferential rules can be used to meet availability requirements such as separating virtual machines between blade centers.
DRS and mandatory rules
DRS takes mandatory rules into account when generating load-balance recommendations. If a rule is created and the current virtual machine placement is in violation with the rule, DRS will create a priority one recommendation (five stars) and executes the recommendation if DRS is set to fully automatic. DRS will not generate recommendation that will violate the rule, it will not migrate virtual machines to or from an ESX server, even if places the source ESX host into maintenance mode. VMotion will reject the operation as well if it detects that the operation is in violation of the mandatory rule
If a reservation is set on the virtual machine, DRS takes both reservation and mandatory affinity rule into account. Both requirements must be satisfied during placement or power on. If DRS is unable to honor either one of the requirements the virtual machine is not powered on or migrated to the proposed destination host. For example if a new rule is created and the current virtual machine placement is in violation of the rule, it can only migrate to a new host if the virtual machine memory reservation can be satisfied on the new host, if this is not possible, DRS will not generate the recommendation.
If a rule is created that conflict with another active, the older rule overrules the newer rule and DRS will disable the new rule.
As you can imagine that mandatory affinity rules can complicate troubleshooting in certain scenarios for example, why a virtual machine is not migrated from a highly utilized host to an alternative lightly utilized host in the cluster.
DPM
DPM does not place an ESX host into standby mode if it will violate the mandatory rule and will power-on ESX hosts if these are needed to meet the requirements of the mandatory riles.
High Availability
Due to the DRS-HA integration in vSphere 4.1, HA respects mandatory (must) rules. During an ESX host failure event, HA ask DRS to supply the list of hosts and places the virtual machines only on the compatible host, i.e. the host that are allowed by the mandatory rules. HA is unaware of the preferential (should) rules, so HA might unknowingly violate the rule during placement of virtual machines after an ESX failure, but the violation will be corrected by the next DRS invocation.
Let’s take a look at a configuration which I think is going to be widely implemented soon, the Oracle Must affinity rule.
1. Place all Oracle virtual machines in a Cluster VM DRS group. (vm01, vm03, vm11, vm20)
2. Place all Oracle licensed ESX host in a Cluster Host DRS Group (ESX07, ESX08, ESX15, ESX16)
3. Select “Must run on Host in Group”

In this scenario, DRS never places, migrates, or recommend placement of a host-affined virtual machine on a host to which is not listed in the Cluster Host DRS Group (ESX01 – ESX06 & ESX09-ESX14). This means that DRS will never ever place the virtual machine on an unlicensed host, not for maintenance mode, not for DPM power saving and not after an ESX host failure event.
This virtual machine to host affinity rule make it possible to run oracle inside big clusters without having to license all the ESX host. I have been involved in a few projects where Oracle license was a constraint. Normally separate smaller clusters were deployed for Oracle database virtual machines, increasing both OPEX and CAPEX of the environment. These rules allows the Oracle virtual machines to run inside the cluster with other virtual machines without having to license all the ESX host inside the cluster. Hereby making the lives easier of both the architect and the administrator. vSphere 4.1, you gotta love it!
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

VMware Fault Tolerance and DPM

June 20, 2010 by frankdenneman

Some requirements of the design I am working on is to be as “green” as possible and to offer the highest level of redundancy for business continuity. Enter VMware Fault Tolerance (FT) and Distributed Power Management (DPM)! When mixing multiple features, the requirements of one feature can have impact on- or even worse becomes a constraint of the other feature.
DPM works together with DRS to VMotion virtual machines onto fewer ESX host servers when the resource demand drops below a specific threshold. In the current release of vSphere, DRS does not consider the FT-enabled virtual machines during load balancing operations and DRS will not migrate FT-enabled virtual machine automatically, because of this DPM cannot power down these hosts until the administrator will manually VMotion the primary or secondary virtual machines to another ESX host server.
Fortunately when enabling DPM on the cluster, you can disable DPM at ESX host level. Due to the current limitations of DRS with VMware Fault Tolerance, it is recommended to disable DPM on at least two ESX server host to act as host for FT-enabled virtual machines.

Memory reclamation, when and how?

June 11, 2010 by frankdenneman

After discussing with Duncan the performance problem presented by @heiner_hardt , we discussed the exact moment the VMkernel decides which reclamation technique it will use and specific behaviors of the reclamation techniques. This article supplements Duncan’s article on Yellow-bricks.com.
Now let’s begin with when the kernel decides to reclaim memory and see how the kernel reclaims memory. So host physical memory is reclaimed based on four “free memory states”, each with a corresponding threshold. Based on the Threshold, the VMkernel chooses which reclamation technique it will use to reclaim memory from virtual machines.

Free Memory state	Threshold	Reclamation technique
High	6%	None
Soft	4%	Ballooning
Hard	2%	Ballooning and Swapping
Low	1%	Swapping

The high memory state has a threshold hold of 6%, that means that 6% of the ESX host physical memory minus the service console memory must be free. When the virtual machines use less than 94% of the host physical memory, the VMkernel will not reclaim memory because there is no need to, but when the memory usage starts to fall towards the free memory threshold the VMkernel will try to balloon memory. The VMkernel selects the virtual machines with the largest amounts of idle memory (detected by the idle memory tax process) and will ask the virtual machine to select it’s idle memory pages. Now to do this the guest os needs to swap those pages, so if the guest is not configured with sufficient swap space, ballooning can become problematic. Linux behaves pretty worse in this situation, invoking OOM (out-of memory) killer when its swap space is full and starts to randomly kill processes.
Back to the VMkernel, in the High and Soft state, ballooning if favored over swapping. If it ESX server cannot reclaim memory by ballooning in time before it reaches the Hard state, the ESX turns to swapping. Swapping has proven to be a sure thing within a limited amount of time. Opposite of the balloon driver, which tries to understand the needs of the virtual machine let the guest decides whether and what to swap, the swap mechanism just brutally picks pages at random from the virtual machine, this impacts the performance of the virtual machine but will help the VMkernel to survive.
Now the fun thing is, before the VMkernel detects the free memory is reaching the soft threshold, it will start to request pages through the balloon driver (vmmemctl), this is because it takes time for the Guest OS to respond to the vmmemctl driver with suitable pages. By starting prematurely, the VMkernel tries to avoid the situation that it will reach the Soft state or worse. So you can see ballooning occurring sometimes before the Soft state is reached. (between 6 and 4% free memory)
One exception is the virtual machine memory limit, if a limit is set on the virtual machine, the VMkernel always tries to balloon or swap pages of the virtual machine after reaching its limit, even if the ESX host has enough free memory available.