• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

HA admission control is not a capacity management tool.

October 17, 2012 by frankdenneman

I receive a lot of questions on why HA doesn’t work when virtual machines are not configured with VM-level reservations. If no VM-level reservations are used, the cluster will indicate a fail over capacity of 99%, ignoring the CPU and memory configuration of the virtual machines. Usually my reply is that HA admission control is not a capacity management tool and I noticed I have been using this statement more and more lately. As it doesn’t scale well explaining it on a per customer basis, it might be a good idea to write a blog article about it.
The basics
Sometimes it’s better to review the basics again and understand where the perception of HA and the actual intended purpose of the product part ways.
Let’s start of what HA admission control is designed for. In the availability guide the two following statement can be found: Quote 1:

“vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.”

Let’s dive in the first quote and especially this statement: “To ensure that sufficient resources are available in a cluster” is the key element, and in particular the word sufficient (resources). What sufficient means for customer A, does not mean sufficient for customer B. As HA does not have an algorithm decoding the meaning of the word sufficient for each customer, HA relies on the customer to set vSphere resource management allocation settings to indicate the importance of resource availability for the virtual machine during resource contention scenarios.
As we are going back to the basics, lets have a quick look at the resource allocation settings that are used in this case, reservations and shares. A reservation indicates the minimum level of resources available to the virtual machine at all times. This reservation guarantees – or protect might be a better word –the availability of physical resources to the virtual machine regardless of the level of contention. No matter how high the contention in the system is, the reservation restricts the VMkernel from reclaiming that particular CPU cycle or memory page.
This means that when a VM is powered on with a reservation, admission control needs to verify if the host can provide these resources at all times. As the VMkernel cannot reclaim those resources, admission control makes sure that when it lets the virtual machine in, it can hold its promise of providing these resources all the time, but also checks if it won’t introduce problems for the VMkernel itself and other virtual machines with a reservation. This is the reason why I like to call admission control the virtual bouncer.
Besides reservation we have shares and shares indicates the relative priority of resource access during contention. A better word to describe this behavior is “opportunistic access”. As the virtual machine is not configured with a reservation, it provides the VMkernel with a more relaxed approach of resource distribution. When resource contention occurs, VMkernel does not need to provide the configured resources all the time, but can distribute the resources based on the activity and the relative priority based on the shares of the virtual machines requesting the resources. Virtual machines configured only with shares will just receive what they can get; there is no restrictive setting for the VMkernel to worry about when running out of resources. Basically the virtual machines will just get what’s left.
In the case of shares, it’s the VMkernel that decides which VM gets how many resources in a relaxed and very social way, where virtual machines configured with a reservation DEMAND to have the reservations available at all times and do not care about the needs of others.
In other words, the VMkernel MUST provide the resources to the virtual machine with reservation first and then divvy up the rest amongst the virtual machines who opted for a opportunistic distribution (shares).
How does this tie in with HA admission control?
The second quote gives us this insight:

“vSphere HA: Ensures that sufficient resources in the cluster are reserved for virtual machine recovery in the event of host failure.”

We know that admission control checks if there is enough resources are available to satisfy the VM-level reservation without interfering with VMkernel operations or VM-level reservations of other virtual machines running on that host. As HA is designed to provide an automated method of host failure recovery, we need to make sure that once a virtual machine is up and running it can continue to run on another host in the cluster if the current hosts fails. Therefor the purpose of HA admission control is to regulate and check if there are enough resources available in the cluster that can satisfy the virtual machine level reservations after a host failure occurs.
Depending on the admission control policy it calculates the capacity required for a failover based on available resources and still comply with the VMkernel resource management rules. Therefor it only needs to look at VM-level reservations, as shares will follow the opportunistic access method.
Semantics of sufficient resources while using shares-only design
In essence, HA will rely on you to determine if the virtual machine will receive the resources you think are sufficient if you use shares. The VMkernel is designed to allow for memory overcommitment while providing performance. HA is just the virtual bouncer that counts the number of heads before it lets the virtual machine in “the club”. If you are on the list for a table, it will get you that table, if you don’t have a reservation HA does not care if you decide to need to sit at a 4-person table with 10 other people fighting for your drinks and food. HA relies on the waiters (resource management) to get you (enough) food as quickly as possible. If you wanted to have a good service and some room at your table, it’s up to you to reserve.
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

Filed Under: VMware Tagged With: admission control, HA

FDM in mixed ESX and vSphere clusters

October 17, 2011 by frankdenneman

Last couple of weeks I’ve been receiving questions about vSphere HA FDM agent in a mixed cluster. When upgrading vCenter to 5.0, each HA cluster will be upgraded to the FDM agent. A new FDM agent will be pushed to each ESX server. The new HA version supports ESX(i) 3.5 through ESXi 5.0 hosts. Mixed clusters will be supported so not all hosts have to be upgraded immediately to take advantage of the new features of FDM. Although mixed environments are supported we do recommend keeping the time you run difference versions in a cluster to a minimum.
The FDM agent will be pushed to each hosts, even if the cluster contains identically configured hosts, for example a cluster containing only vSphere 4.1 update 1 will still be upgraded to the new HA version. The only time vCenter will not push the new FDM agent to a host if the host in question is a 3.5 host without the required patch.
When using clusters containing 3.5 hosts, it is recommended to upgrade the ESX host to ESX350-201012401-SG PATCH (ESX 3.5) or ESXe350-201012401-I-BG PATCH (ESXi) patch first before upgrading vCenter to vCenter 5.0. If you still get the following error message:
Host ‘‘ is of type ( ) with build , it does not support vSphere HA clustering features and cannot be part of vSphere HA clusters.

Visit the VMware knowledgebase article: 2001833.

Filed Under: VMware Tagged With: FDM, HA, Mixed clusters

Setting Correct Percentage of Cluster Resources Reserved

January 20, 2011 by frankdenneman

vSphere introduced the HA admission control policy “Percentage of Cluster Resources Reserved”. This policy allows the user to specify a percentage of the total amount of available resources that will stay reserved to accommodate host failures. When using vSphere 4.1 this policy is the de facto recommended admission control policy as it avoids the conservative slots calculation method.
Reserved failover capacity
The HA Deepdive page explains in detail how the “percentage resources reserved” policy works, but to summarize; the CPU or memory capacity of the cluster is calculated as followed;The available capacity is the sum of all ESX hosts inside the cluster minus the virtualization overhead, multiplied by (1-percentage value).
For instance; a cluster exists out of 8 ESX hosts, each containing 70GB of available RAM. The percentage of cluster resources reserved is set to 20%. This leads to a cluster memory capacity of 448GB (70GB+70GB+70GB+70GB+70GB+70GB+70GB+70GB) * (1 – 20%). 112GB is reserved as failover capacity. Although the example zooms in on memory, the percentage set applies both CPU and memory resources.
Once a percentage is specified, that percentage of resources will be unavailable for active virtual machines, therefore it makes sense to set the percentage as low as possible. There are multiple approaches for defining a percentage suitable for your needs. One approach, the host-level-approach is to use a percentage that corresponds with the contribution of one or host or a multiplier of that. Another approach is the aggressive approach which sets a percentage that equals less than the contribution of one host. Which approach should be used?
Host-level
In the previous example 20% was used to be reserved for resources in an 8-host cluster. This configuration reserves more resources than a single host contributes to the cluster. High Availability’s main objective is to provide automatic recovery for virtual machines after a physical server failure. For this reason, it is recommended to reserve resource equal to a single host or a multiplier of that.
When using the per-host level of granularity in an 8-host cluster (homogeneous configured hosts), the resource contribution per host to the cluster is 12.5%. However, the percentage used must be an integer (whole number). Using a conservative approach it is better to round up to guarantee that the full capacity of one host is protected, in this example, the conservative approach would lead to a percentage of 13%.

Aggressive approach
I have seen recommendations about setting the percentage to a value that is less than the contribution of one host to the cluster. This approach reduces the amount of resources reserved for accommodating host failures and results in higher consolidation ratios. One might argue that this approach can work as most hosts are not fully loaded, however it eliminates the guarantee that after a failure all impacted virtual machines will be recovered.
As datacenters are dynamic, operational procedures must be in place to -avoid or reduce- the impact of a self-inflicted denial of service. Virtual machine restart priorities must be monitored closely to guarantee that mission critical virtual machines will be restarted before virtual machine with a lower operational priority. If reservations are set at virtual machine level, it is necessary to recalculate the failover capacity percentage when virtual machines are added or removed to allow the virtual machine to power on and still preserve the aggressive setting.
Expanding the cluster
Although the percentage is dynamic and calculates capacity at a cluster-level, when expanding the cluster the contribution per host will decrease. If you decide to continue using the percentage setting after adding hosts to the cluster, the amount of reserved resources for a fail-over might not correspond with the contribution per host and as a result valuable resources are wasted. For example, when adding four hosts to an 8-host cluster while continue using the previously configured admission control policy value of 13% will result in a failover capacity that is equivalent to 1.5 hosts. The following diagram depicts a scenario where an 8 host cluster is expanded to 12 hosts; each with 8 2GHz cores and 70GB memory. The cluster was originally configured with admission control set to 13% which equals to 109.2 GB and 24.96 GHz. If the requirement is to be able to recover from 1 host failure 7,68Ghz and 33.6GB is “wasted”.

Maximum percentage
High availability relies on one primary node to function as the failover coordinator to restart virtual machines after a host failure. If all five primary nodes of an HA cluster fail, automatic recovery of virtual machines is impossible. Although it is possible to set a failover spare capacity percentage of 100%, using a percentage that exceeds the contribution of four hosts is impractical as there is a chance that all primary nodes fail.

Although configuration of primary agents and configuration of the failover capacity percentage are non-related, they do impact each other. As cluster design focus on host placement and rely on host-level hardware redundancy to reduce this risk of failing all five primary nodes, admission control can play a crucial part by not allowing more virtual machines to be powered on while recovering from a maximum of four host node failure.
This means that maximum allowed percentage needs to be calculated by summing the contribution per host x 4. For example the recommended maximum allowed configured failover capacity of a 12-host cluster is 34%, this will allow the cluster to reserve enough resources during a 4 host failure without over allocating resources that could be used for virtual machines.

Filed Under: VMware Tagged With: HA, Percentage based, VMware

Impact of oversized virtual machines part 2

December 17, 2010 by frankdenneman

In part 1 of the series of post on the impact of oversized virtual machines NUMA architecture, memory overhead reservation and share levels are reviewed, part 2 zooms in of the impact of memory overhead reservation and share levels on HA and DRS.
Impact of memory overhead reservation on HA Slot size
The VMware High Availability admission control policy “Host failures cluster tolerates” calculates a slot size to determine the maximum amount of virtual machines active in the cluster without violating failover capacity. This admission control policy determines the HA cluster slot size by calculating the largest CPU reservation, largest memory reservation plus it’s memory overhead reservation. If the virtual machine with the largest reservation (which could be an appropriate sized reservation) is oversized, its memory overhead reservation still can substantial impact the slot size.
The HA admission control policy “Percentage of Cluster Resources Reserved” calculate the memory component of its mechanism by summing the reservation plus the memory overhead of each virtual machine. Therefore allowing the memory overhead reservation to even have a bigger impact on admission control than the calculation done by the “Host Failures cluster tolerates” policy.
DRS initial placement
DRS will use a worst-case scenario during initial placement. Because DRS cannot determine resource demand of the virtual machine that is not running, DRS assumes that both the memory demand and CPU demand is equal to its configured size. By oversizing virtual machines it will decrease the options in finding a suitable host for the virtual machine. If DRS cannot guarantee the full 100% of the resources provisioned for this virtual machine can be used it will vMotion virtual machines away so that it can power on this single virtual machine. In case there are not enough resources available DRS will not allow the virtual machine to be powered on.
Shares and resource pools
When placing a virtual machine inside a resource pool, its shares will be relative to the other virtual machines (and resource pools) inside the pool. Shares are relative to all the other components sharing the same parent; easier way to put it is to call it sibling share level. Therefore the numeric share values are not directly comparable across pools because they are children of different parents.

By default a resource pool is configured with the same share amount equal to a 4 vCPU, 16GB virtual machine. As mentioned in part 1, shares are relative to the configured size of the virtual machine. Implicitly stating that size equals priority.
Now lets take a look again at the image above. The 3 virtual machines are reparented to the cluster root, next to resource pools 1 and 2. Suppose they are all 4 vCPU 16GB machines, their share values are interpreted in the context of the root pool and they will receive the same priority as resource pool 1 and resource pool2. This is not only wrong, but also dangerous in a denial-of-service sense — a virtual machine running on the same level as resource pools can suddenly find itself entitled to nearly all cluster resources.
Because of default share distribution process we always recommend to avoid placing virtual machines on the same level of resource pools. Unfortunately it might happen that a virtual machine is reparented to cluster root level when manually migrating a virtual machine using the GUI. The current workflow defaults to cluster root level instead of using its current resource pool. Because of this it’s recommended to increase the number of shares of the resource pool to reflect its priority level. More info about shares on resource pools can be found in Duncan’s post on yellow-bricks.com.
Go to Part 3: Impact of oversized virtual machine.

Filed Under: DRS, VMware Tagged With: DRS, HA, memory overhead reservation, Shares, VMware

vSwitch Failback and High Availability

October 22, 2010 by frankdenneman

One setting most admins get caught off-guard is vSwitch Failback setting in combination with HA. If the management network vSwitch is configured with Active/Standby NICs and the HA isolation response is set to “Shutdown” VM or “Power-off” VM it is advised to set the vSwitch Failback mode to No. If left at default (Yes), all the ESX hosts in the cluster or entire virtual infrastructure might issue an Isolation response if one of the management network physical switches is rebooted. Here’s why:
Just a quick rehash:
Active\Standby
One NIC (vmnic0) is assigned as active to the management\service console portgroup, the second NIC (vmnic1) is configured as standby. The vMotion portgroup is configured with the first NIC (vmnic0) in standby mode and the second NIC as Active (vmnic1).

Active Standby setup management network vSwitch0
Active Standby setup management network vSwitch0

Failback
The Failback setting determines if the VMkernel will return the uplink (NIC) to active duty after recovery of a downed link or failed NIC. If the Failback setting is set to Yes the NIC will return to active duty, when Failback is set to No the failed NIC is assigned the Standby role and the administrator must manually reconfigure the NIC to the active state.

Effect of Failback yes setting on environment

When using the default setting of Failback unexpected behavior can occur during maintenance of a physical switch. Most switches, like those from Cisco, initiate the port after boot, so called Lights on. The port is active but is still unable to receive or transmitting data. The process from Lights-on to forwarding mode can take up to 50 seconds; unfortunately ESX is not able to distinguish between Lights-on status and forwarding mode, there for treating the link as usable and will return the NIC to active status again.
High Availability will proceed to transmit heartbeats and expect to receive heartbeats, after missing 13 seconds of heartbeats HA will try to ping its Isolation Address, due to the specified Isolation respond it will shut down or power-off the virtual machines two seconds later to allow other ESX hosts to power-up the virtual machines. But because it is common – recommended even – to configure each host in the cluster in an identical manner, each active NIC used by the management network of every ESX host connect to the same physical switch. Due to this design, once the switch is booted, a cluster wide Isolation response occurs resulting in a cluster wide outage.
To allow switch maintenance, it’s better to set the vSwitch failback mode to No. Selecting this setting introduces an increase of manual operations after failure or certain maintenance operations, but will reduce the change of “false positives” and cluster-wide isolation responses.

Filed Under: Networking Tagged With: Failover, HA, VMware, vswitch

  • Page 1
  • Page 2
  • Go to Next Page »

Copyright © 2025 · SquareOne Theme on Genesis Framework · WordPress · Log in