Frank Denneman - Chief Technologist AI at VMware

HA Percentage based admission control from a resource management perspective – Part 1

February 15, 2013 by frankdenneman

Disclaimer: This article contains references to the words master and slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment.

HA admission control is quite challenging to understand as it interacts with multiple layers of resource management. In the upcoming series of articles I want to focus on the HA Percentage based admission control and how it interacts with vCenter and Host management. Let’s cover the basis first before diving into percentage based admission control.

Virtual machine service level agreements
HA provides the service to start up a virtual machine when the host it’s running on fails. That’s what HA is designed to do, however, HA admission control is tightly interlinked with virtual machine reservations and that because virtual machines is a hard SLA for the entire virtual infrastructure. Let’s focus on the two different “service level agreements” you can define for your virtual machine.

Share-based priority: A shared based priority allows the virtual machine to get all the resources it demands until the demand exceeds supply on the ESXi host. Depending on the number of shares and activity, the resource manager will determine the relative priority to resource access. Resources will be reclaimed from inactive virtual machines and distributed to high priority active virtual machines. If all virtual machines are active, then the share values determine the distribution of resources. Let’s coin the term “Soft SLA” for shared based priority as the resource manager allows a virtual machine to be powered on even if there are not enough resources available to provide an adequate user experience/performance of the application running inside the virtual machine.

The resource manager just distributes resources based on the shares value set by the administrator, it expects that correct shares were set to provide an adequate performance level at all times.

Reservation based priority: Reservations can be defined as a “Hard SLA”. Under all circumstances, the resources protected by a reservation must be available to that particular virtual machine. Even if every other virtual machine is in need of resources, the resource manager cannot reclaim these resources as it is restricted by this Hard SLA. It must provide. In order to meet the SLA, the host checks to see if it has enough free unreserved resources available during the power-on operation of the virtual machine. If it doesn’t have the necessary unreserved resources available, the host cannot meet the Hard SLA and therefore the host rejects the virtual machine. Go look somewhere else buddy 😉

Percentage based admission control
The percentage-based admission control is my favorite HA admission control policy because it gets rid of the over-conservative slot mechanism used by the “host failure tolerates” policy. With percentage-based admission control, you set a percentage of the cluster resources that will be reserved for failover capacity.

For example, when you set a 25% percent of reserved failover memory capacity, 25% of the cluster resources are reserved. In a 4-host cluster, this makes sense as the 25% embodies the available resources of a single host, thus 25% equals a failover tolerance of one host. If one host fails, the remaining three hosts can restart the virtual machines that were potentially using 75% of resources of the failed host.

For the sake of simplicity, this diagrams show an equal load distribution, however, due to the VM reservations and other factors, the distribution of virtual machines might differ.

Let’s take a closer look at that 25% and that 75 %. The 25% of reserved failover memory capacity is not done on a per-host basis; therefore the previous diagram is not completely accurate. It’s because this failover capacity is tracked and enforced by HA at the vCenter layer, to be more precise it’s on the HA cluster level. This is crucial information to understand the difference between admission control during normal provisioning/ power-on operations and admission control during restart operations done by HA.

25% reserved failover memory capacity
The resource allocation tab of the cluster shows what happening after enabling HA. The first screenshot is the resource allocation of the cluster before HA is enabled. Notice the 1 GB reservation.

When setting the reserved failover memory capacity to 25% the following thing happens:
25% of the cluster capacity (363.19*0.25=90.79) is added to the reserved capacity, plus the existing 1.03GB, totaling the reserved capacity to 91.83GB. This means that this cluster has 271.37 of available capacity left. Exactly what is this available capacity? This is capacity what’s often revered as “unreserved capacity”. What will happen with this capacity when we power-on a 16GB virtual machine without a reservation? Will it reduce the available capacity to 255.37? No, it will not. This graph shows only how much of the total capacity is assigned to an object with a hard SLA (reservations).

Thus when a virtual machine is powered on or provisioned into the cluster via vCenter by the user it goes through HA admission controls first:

After HA accepts the virtual machine, DRS admissions control and Host admission control review the virtual machine first before powering it on. The article admission control family describes the admission control workflow in-depth.

75% unreserved capacity
What happened to that 90 GB? Is it gone? Is a part of this capacity reserved on each host in the cluster and unavailable for virtual machines to use? No luckily the 90GBs are not gone, HA just reduced the available capacity so that during a placement operation (deployment or power-on of an existing VM) vCenter knows if one of the clusters can meet the hard SLA of a reservation. To illustrate this behavior I took a screenshot of ESXtop output of a host:

In this capture, you can see that the host is serving 5 virtual machines (W2K8-00 to W2K8-04). Each virtual machine is configured with 16GB (Memsz) and the resource manager has assigned a size target of memory above the 16GB (SZTGT). This size target is the number of resources the resource manager has allocated. The reason why it’s higher than the memsize is because of the overhead memory reservation. The memory needed by the VMkernel to run the virtual machine. As you can see these 5 virtual machines use up 82GB, which is more than the 67.5 GB is supposed to have if 25% was reserved as failover capacity on each host.

Failover process and the role of host admission control
This is the key to understand why HA “ignores” reserved failover capacity during a failover process. As HA consists of FDM agents running on each host, it is the master FDM agent who reviews the protected list and initiates a power-on operation of a virtual machine that is listed as protected but is not running. The FDM agent ensures that the virtual machines are powered on. As you can see this all happens on a host-level basis, vCenter is not included in this party. Therefore the virtual machine start-up operation is reviewed by the host admission control. If the virtual machine is configured with a soft SLA, host admission control only checks if it can satisfy the VM overhead reservation. If the VM is protected by a VM reservation, host admission control checks if it can satisfy both the VM reservation as well as the VM overhead reservation. If it cannot if will fail the startup and FDM has to find another host that can run this virtual machine. If all host fail to power-on the virtual machine, HA will request DRS to “defragment” the cluster by moving virtual machines around to make room on a host and free up some unreserved capacity.

But remember, if a virtual machine has a soft SLA, HA will restart the virtual machine regardless of the amount of capacity to run the virtual machines providing adequate performance to the users. This behavior is covered in-depth in the article: “HA admission control is not a capacity management tool”. To ensure virtual machine performance during a host failure, one must focus on capacity planning and/or configuration of resource reservations.

Part 2 of this series will take a closer look at how to configure a proper percentage value that avoids memory overcommitment.

Have you signed up for the Benelux Software Defined Datacenter Roadshow yet?

February 14, 2013 by frankdenneman

In less than 3 weeks time, the Benelux Software Defined Datacenter Roadshow starts. Industry-recognized experts from both IBM and VMware share their vision and insights on how to build a unified datacenter platform that provides automation, flexibility and efficiency to transform the way you deliver IT. Not only can you attend their sessions and learn how to abstract, pool and automate your IT services, the SDDC roadshow provides you to meet the expert, sit down and discus technology.
The speakers and their field of expertise:
VMware
Frank Denneman – Resource Management Expert
Cormac Hogan – Storage Expert
Kamau Wanguhu – Software Defined Networking Expert
Mike Laverick – Cloud Infrastructure Expert
Ton Hermes – End User Computing Expert
IBM
Tikiri Wanduragala – IBM PureSystems Expert
Dennis Lauwers – Converged Systems Expert
Geordy Korte – Software Defined Networking Expert
Andreas Groth – End User Computing Expert
The roadshow is held in three different countries:
Netherlands – IBM forum in Amsterdam – March 5th 2013
Belgium – IBM forum in Brussels – March 7th 2013
Luxembourg – March 8th 2013
The Software Defined Datacenter Roadshow is a full day event and best of all it is free!
Sign up now!

VCD and initial placement of virtual disks in a Storage DRS datastore cluster

February 13, 2013 by frankdenneman

Recently a couple of consultants brought some unexpected behavior of vCloud Director to my attention. If the provider vDC is connected to a datastore cluster and a virtual disk or vApp is placed in the datastore, vCD displays an error when the datastores do not have enough free space available.
Last year I wrote an article (Storage DRS initial placement and datastore cluster defragmentation) describing storage DRS initial placement engine and it’s ability to move virtual machines around the datastore cluster if individual datastores did not have enough free space to store the virtual disk.
I couldn’t figure it out why Storage DRS did not defragment the datastores in order to place the vApp, thus I asked the engineers about this behavior. It turns out that this behavior is by design. When creating vCloud director the engineers optimized the initial placement engine of vCD for speed. When deploying a virtual machine, defragmenting a datastore cluster can take some time. To avoid waiting, vCD reports an error of not enough free space and relies on the vCloud administrator to manage and correct the storage layer. In other words, Storage DRS initial placement datastore cluster defragmentation is disabled in vCloud Director.
I can understand the choice the vCD engineers made, but I also believe in the benefit of having datastore cluster defragmentation. I’m interested in your opinion? Would you trade initial placement speed over reduced storage management?

Using Remote desktop connection on a Mac? Switch to CoRD

February 13, 2013 by frankdenneman

One of the benefits of working for VMware technical marketing, is that you have your own lab.
Luckily my lab is hosted by an external datacenter, which helps me avoid a costly power-bill at home each month 🙂 However, that means I need to connect to my lab remotely.
As a MAC user I used Remote Desktop Connection for MAC from Microsoft. One of the limiting factors of this RDP for MAC is the limited resolution of 1400 x 1050 px. The screens at home have a minimum resolution 2560 x 1440 px. This first world problem bugged me until today!
Today I found CoRD – http://cord.sourceforge.net/. CoRD allows me to connect to my servers with a resolution 2500 x 1600, using the full potential of my displays at home.

Another create option is the hotkey function, using a key combination I spin up a remote desktop connection. I love these kinds of shortcuts that help me reduce time spend navigating throughout the UI.
If you are using a MAC and often RDP into your lab, I highly recommend to download CoRD.
Btw, it’s free 😉

Expandable reservation on resource pools, how does it work?

February 12, 2013 by frankdenneman

It seems that the expandable reservation setting of a resource pool appears to be shrouded in mystery. How does it work, what is it for, and what does it really expand? The expandable reservation allows the resource pool to allocate physical resource (CPU/memory) protected by a reservation from a parent source to satisfy its child object reservation. Let’s dig a little deeper into this.
Parent-child relation
A resource pool provides resources to its child objects. A child object can either be a virtual machine or a resource pool. This is what called the parent-child relationship. If a resource pool (A), contains a resource pool (B), which contains a resource pool (C), then C is the child of B. B is the parent of C, but is the child of A, A is the parent of B. There is no terminology for the relation A-C as A only provides resource to B, it does not care if B provide any resource to C.

As a virtual machine is placed in to a resource pool, the virtual machine becomes a child-object of the resource pool. It is the responsibility of the resource pool to provide the resources the virtual machine requires. If a virtual machine is configured with a reservation, than it will request the physical resources from its parent resource pool.

Remember that a reservation guarantees that the resources protected by the reservation will and cannot be reclaimed by the VMkernel, even during memory pressure. Therefor the reservation of the virtual machine is directed to its parent and the parent must exclusively provide this to the virtual machine. It can only provide these resources from its own pool of protected resources. The resource pool can only distribute the resources it has obtained itself.
Protected or reserved resources?
I’m deliberately calling a resource claimed by a reservation a protected resource, as the VMkernel cannot reclaim it. However when a resource pool is configured with a reservation, it immediately claims this memory from its parent. This goes on all the way up to the cluster level. The cluster is the root resource pool and all the resources provided by the ESXi hosts are owned by the resource pool and protected by a reservation. Therefor the cluster – root resource pool – contains and manages the protected pool of resources.

For example, the cluster has 100GB of resources, meaning that the root resource pool consists of 100GB of protected memory. Resource pool A is configured with a 50GB reservation, consuming this 50Gb from the root resource pool. However resource pool B is configured with a 30GB reservation, immediately claiming 30 GB of resources protected by the reservation of resource pool A. Leaving resource pool A with only 20 GB of protected resources for itself. Resource Pool C is configured with a 20GB memory reservation. Resource pool C claims this from its parent, resource pool B which is left with 10GB of protected resources for itself.
But what happens if the resource pool runs out of protected resources? Or is not configured with a reservation at all? In other words, If the child objects in the resource pool are configured with reservations that exceeds the reservation set on the resource pool, the resource pool needs to request protected resources from its parent. This can only be done if expandable reservation is enabled.
Please note that the resource pool request protected resources, it will not accept resources that are not protected by a reservation.

Now in this scenario, the five virtual machines in the resource pool are each configured with 5GB memory reservation, totaling it to 25GB. Resource pool C is configured with a 20GB memory reservation. Therefor resource pool is required to make a request for 5GB of protected memory resources on behalf of the virtual machines to its parent resource pool B.
If resource pool B does not have the protected resources itself, it can request these protected resources from its parent. This can only occur when the resource pool is configured with expandable reservation enabled. The last stop in the cluster it the cluster itself. What can stop this river of requests? Two things, the request for protected resources is stopped by a resource limit or by a disabled expandable reservation. If a resource pool has expandable reservation disabled, it will try to satisfy the reservation itself if it’s unable to do so, it will deny the reservation request. If a resource pool is set with a limit, the resource pool is limited to that amount of physical resources.
For example if the parent resource pool has a reservation and a limit of 20GB, the reservation on behalf of its child need to be satisfied by the protected pool itself otherwise it will deny the resource request.
Now lets use a more complex scenario, resource pool B is configured with expandable reservation enabled and a 30 GB reservation. A limit is set to 35GB. Resource pool C is requesting an additional 10GB on top of the 20GB it is already granted. Resource pool B is running 2 VM with a total reservation of 10GB. This means the protected pool of Resource pool B is servicing 20GB resource request from resource pool C and 10 GB for its own virtual machines. Its protected pool is depleted, the additional 10GB request of resource pool C is denied, as this would raise the protected pool of resource pool B to a total of 40GB memory, which exceeds the 35GB limit.
Virtual machine memory overhead
Please remember that each virtual machine is configured with a memory reservation. To run the virtual machine a small amount of memory resources are required by the VMkernel. This is called the virtual machine memory overhead. To be able to run a virtual machine inside a resource pool, either the expandable reservation should be enabled or a memory reservation is configured on the resource pool.