• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

Implicit anti-affinity rules and DRS placement behavior

February 19, 2013 by frankdenneman

Yesterday I had an interesting conversation with a colleague about affinity rules and if DRS reviews the complete state of the cluster and affinity rules when placing a virtual machine. The following scenario was used to illustrate to question:

The following affinity rules are defined:
1. VM1 and VM2 must stay on the same host
2. VM3 and VM4 must stay on the same host
3. VM1 and VM3 must NOT stay on the same host
If VM1 and VM3 is deployed first, everything will be fine. Because VM1 and
VM3 will be placed on 2 different hosts, and VM2 and VM 4 will also be
placed accordingly
However, if VM1 is deployed first, and then VM4, there isn’t an explicit
rule to say these two need to be on separate hosts, this is implied by
looking into dependencies of the 3 rules created above. Would DRS be
intelligent enough to recognize this? Or will it place VM1 and VM4 on the
same host, but by the time VM3 needs to be placed, there is a clear
deadlock.

The situation where its not logical to place VM4 and VM1 on the same host can be deemed as a implicit anti-affinity rule. It’s not a real rule, but if all virtual machines are operational, VM4 should not be on the same host as VM1. DRS doesn’t react to these implicit rules. Here’s why:
When provisioning a virtual machine, DRS sorts the available hosts on utilization first. Then it goes through a series of checks such as the compatibility between the virtual machine and the host. Does the host have a connection to the datastore? Is the vNetwork available at the host? And then it will check to see if placing the virtual machine violates any constraints. A constraint could be a VM-VM affinity/anti-affinity rule or a VM-Host affinity/anti-affinity rule.
In the scenario where VM1 is running, DRS is safe to place VM4 on the same host as it does not violate any affinity rule. When DRS wants to place VM3, it determines that placing VM3 on the same host VM4 is running violates the anti-affinity rule VM1 and VM3. Therefor it will migrate VM4 the moment VM3 is deployed.
During placement DRS only checks the current affinity rules and determines if placement violates any affinity rules. If not, then the host with the most connections and the lowest utilization is selected. DRS cannot be aware of any future power-on operations, there is no vCrystal bowl. The next power-on operation might be 1 minute away or might be 4 days away. By allowing DRS to select the best possible placement, the virtual machine is provided an operating environment that has the most resources available at that time. If DRS took al the possible placement configurations into account, it could either end up in gridlock or place the virtual machine on a higher utilized host for a long time in order to prevent a vMotion operation of another virtual machine to satisfy the affinity rule. All that time that virtual machine could be performing beter if it was placed on a lower utilized host. On the long run, dealing with constraints the moment they occur is far more economical.
Similar behavior occurs when creating a rule. DRS will not display a warning when creating a collections of rules that create a conflict when all virtual machines are turned on. As DRS is unaware of the intentions of the user, it cannot throw a warning. Maybe the virtual machines will not be powered on in the current cluster state. Or maybe this ruleset is in preparation for the new hosts that will be added to the cluster shortly. Also understands that if a host is in maintenance mode, this host is considered to be external to the cluster. It does not count as an valid destination and the resources are not used in the equation. However we as users still see the host part of the cluster. If those rule sets are created while a host is in maintenance mode, than according to the previous logic DRS must throw an error, while the user assumes the rules are correct as the cluster provides enough placement options. As clusters can grow and shrink dynamically, DRS deals only with violations when the rules become active and that is during power-on operations (DRS placement).

Filed Under: DRS

HA Percentage based admission control from a resource management perspective – Part 1

February 15, 2013 by frankdenneman

Disclaimer: This article contains references to the words master and slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment.

HA admission control is quite challenging to understand as it interacts with multiple layers of resource management. In the upcoming series of articles I want to focus on the HA Percentage based admission control and how it interacts with vCenter and Host management. Let’s cover the basis first before diving into percentage based admission control.

Virtual machine service level agreements
HA provides the service to start up a virtual machine when the host it’s running on fails. That’s what HA is designed to do, however, HA admission control is tightly interlinked with virtual machine reservations and that because virtual machines is a hard SLA for the entire virtual infrastructure. Let’s focus on the two different “service level agreements” you can define for your virtual machine.

Share-based priority: A shared based priority allows the virtual machine to get all the resources it demands until the demand exceeds supply on the ESXi host. Depending on the number of shares and activity, the resource manager will determine the relative priority to resource access. Resources will be reclaimed from inactive virtual machines and distributed to high priority active virtual machines. If all virtual machines are active, then the share values determine the distribution of resources. Let’s coin the term “Soft SLA” for shared based priority as the resource manager allows a virtual machine to be powered on even if there are not enough resources available to provide an adequate user experience/performance of the application running inside the virtual machine.

The resource manager just distributes resources based on the shares value set by the administrator, it expects that correct shares were set to provide an adequate performance level at all times.

Reservation based priority: Reservations can be defined as a “Hard SLA”. Under all circumstances, the resources protected by a reservation must be available to that particular virtual machine. Even if every other virtual machine is in need of resources, the resource manager cannot reclaim these resources as it is restricted by this Hard SLA. It must provide. In order to meet the SLA, the host checks to see if it has enough free unreserved resources available during the power-on operation of the virtual machine. If it doesn’t have the necessary unreserved resources available, the host cannot meet the Hard SLA and therefore the host rejects the virtual machine. Go look somewhere else buddy 😉

Percentage based admission control
The percentage-based admission control is my favorite HA admission control policy because it gets rid of the over-conservative slot mechanism used by the “host failure tolerates” policy. With percentage-based admission control, you set a percentage of the cluster resources that will be reserved for failover capacity.

For example, when you set a 25% percent of reserved failover memory capacity, 25% of the cluster resources are reserved. In a 4-host cluster, this makes sense as the 25% embodies the available resources of a single host, thus 25% equals a failover tolerance of one host. If one host fails, the remaining three hosts can restart the virtual machines that were potentially using 75% of resources of the failed host.

00-failover capacity
For the sake of simplicity, this diagrams show an equal load distribution, however, due to the VM reservations and other factors, the distribution of virtual machines might differ.

Let’s take a closer look at that 25% and that 75 %. The 25% of reserved failover memory capacity is not done on a per-host basis; therefore the previous diagram is not completely accurate. It’s because this failover capacity is tracked and enforced by HA at the vCenter layer, to be more precise it’s on the HA cluster level. This is crucial information to understand the difference between admission control during normal provisioning/ power-on operations and admission control during restart operations done by HA.

25% reserved failover memory capacity
The resource allocation tab of the cluster shows what happening after enabling HA. The first screenshot is the resource allocation of the cluster before HA is enabled. Notice the 1 GB reservation.

01-cluster-resource-allocation-before-HA
02-Reserved-failover-Memory-capacity
When setting the reserved failover memory capacity to 25% the following thing happens:03-cluster-resource-allocation-HA-enabled
25% of the cluster capacity (363.19*0.25=90.79) is added to the reserved capacity, plus the existing 1.03GB, totaling the reserved capacity to 91.83GB. This means that this cluster has 271.37 of available capacity left. Exactly what is this available capacity? This is capacity what’s often revered as “unreserved capacity”. What will happen with this capacity when we power-on a 16GB virtual machine without a reservation? Will it reduce the available capacity to 255.37? No, it will not. This graph shows only how much of the total capacity is assigned to an object with a hard SLA (reservations).
031-cluster-reserved-capacity
Thus when a virtual machine is powered on or provisioned into the cluster via vCenter by the user it goes through HA admission controls first:
04-HA-admission-control
After HA accepts the virtual machine, DRS admissions control and Host admission control review the virtual machine first before powering it on. The article admission control family describes the admission control workflow in-depth.

75% unreserved capacity
What happened to that 90 GB? Is it gone? Is a part of this capacity reserved on each host in the cluster and unavailable for virtual machines to use? No luckily the 90GBs are not gone, HA just reduced the available capacity so that during a placement operation (deployment or power-on of an existing VM) vCenter knows if one of the clusters can meet the hard SLA of a reservation. To illustrate this behavior I took a screenshot of ESXtop output of a host:

05-ESXtop
In this capture, you can see that the host is serving 5 virtual machines (W2K8-00 to W2K8-04). Each virtual machine is configured with 16GB (Memsz) and the resource manager has assigned a size target of memory above the 16GB (SZTGT). This size target is the number of resources the resource manager has allocated. The reason why it’s higher than the memsize is because of the overhead memory reservation. The memory needed by the VMkernel to run the virtual machine. As you can see these 5 virtual machines use up 82GB, which is more than the 67.5 GB is supposed to have if 25% was reserved as failover capacity on each host.

Failover process and the role of host admission control
This is the key to understand why HA “ignores” reserved failover capacity during a failover process. As HA consists of FDM agents running on each host, it is the master FDM agent who reviews the protected list and initiates a power-on operation of a virtual machine that is listed as protected but is not running. The FDM agent ensures that the virtual machines are powered on. As you can see this all happens on a host-level basis, vCenter is not included in this party. Therefore the virtual machine start-up operation is reviewed by the host admission control. If the virtual machine is configured with a soft SLA, host admission control only checks if it can satisfy the VM overhead reservation. If the VM is protected by a VM reservation, host admission control checks if it can satisfy both the VM reservation as well as the VM overhead reservation. If it cannot if will fail the startup and FDM has to find another host that can run this virtual machine. If all host fail to power-on the virtual machine, HA will request DRS to “defragment” the cluster by moving virtual machines around to make room on a host and free up some unreserved capacity.

But remember, if a virtual machine has a soft SLA, HA will restart the virtual machine regardless of the amount of capacity to run the virtual machines providing adequate performance to the users. This behavior is covered in-depth in the article: “HA admission control is not a capacity management tool”. To ensure virtual machine performance during a host failure, one must focus on capacity planning and/or configuration of resource reservations.

Part 2 of this series will take a closer look at how to configure a proper percentage value that avoids memory overcommitment.

Filed Under: HA

Have you signed up for the Benelux Software Defined Datacenter Roadshow yet?

February 14, 2013 by frankdenneman

In less than 3 weeks time, the Benelux Software Defined Datacenter Roadshow starts. Industry-recognized experts from both IBM and VMware share their vision and insights on how to build a unified datacenter platform that provides automation, flexibility and efficiency to transform the way you deliver IT. Not only can you attend their sessions and learn how to abstract, pool and automate your IT services, the SDDC roadshow provides you to meet the expert, sit down and discus technology.
The speakers and their field of expertise:
VMware
Frank Denneman – Resource Management Expert
Cormac Hogan – Storage Expert
Kamau Wanguhu – Software Defined Networking Expert
Mike Laverick – Cloud Infrastructure Expert
Ton Hermes – End User Computing Expert
IBM
Tikiri Wanduragala – IBM PureSystems Expert
Dennis Lauwers – Converged Systems Expert
Geordy Korte – Software Defined Networking Expert
Andreas Groth – End User Computing Expert
The roadshow is held in three different countries:
Netherlands – IBM forum in Amsterdam – March 5th 2013
Belgium – IBM forum in Brussels – March 7th 2013
Luxembourg – March 8th 2013
The Software Defined Datacenter Roadshow is a full day event and best of all it is free!
Sign up now!

Filed Under: VMware

VCD and initial placement of virtual disks in a Storage DRS datastore cluster

February 13, 2013 by frankdenneman

Recently a couple of consultants brought some unexpected behavior of vCloud Director to my attention. If the provider vDC is connected to a datastore cluster and a virtual disk or vApp is placed in the datastore, vCD displays an error when the datastores do not have enough free space available.
Last year I wrote an article (Storage DRS initial placement and datastore cluster defragmentation) describing storage DRS initial placement engine and it’s ability to move virtual machines around the datastore cluster if individual datastores did not have enough free space to store the virtual disk.
I couldn’t figure it out why Storage DRS did not defragment the datastores in order to place the vApp, thus I asked the engineers about this behavior. It turns out that this behavior is by design. When creating vCloud director the engineers optimized the initial placement engine of vCD for speed. When deploying a virtual machine, defragmenting a datastore cluster can take some time. To avoid waiting, vCD reports an error of not enough free space and relies on the vCloud administrator to manage and correct the storage layer. In other words, Storage DRS initial placement datastore cluster defragmentation is disabled in vCloud Director.
I can understand the choice the vCD engineers made, but I also believe in the benefit of having datastore cluster defragmentation. I’m interested in your opinion? Would you trade initial placement speed over reduced storage management?

Filed Under: Storage DRS

Using Remote desktop connection on a Mac? Switch to CoRD

February 13, 2013 by frankdenneman

One of the benefits of working for VMware technical marketing, is that you have your own lab.
Luckily my lab is hosted by an external datacenter, which helps me avoid a costly power-bill at home each month 🙂 However, that means I need to connect to my lab remotely.
As a MAC user I used Remote Desktop Connection for MAC from Microsoft. One of the limiting factors of this RDP for MAC is the limited resolution of 1400 x 1050 px. The screens at home have a minimum resolution 2560 x 1440 px. This first world problem bugged me until today!
Today I found CoRD – http://cord.sourceforge.net/. CoRD allows me to connect to my servers with a resolution 2500 x 1600, using the full potential of my displays at home.
Cord
Another create option is the hotkey function, using a key combination I spin up a remote desktop connection. I love these kinds of shortcuts that help me reduce time spend navigating throughout the UI.
If you are using a MAC and often RDP into your lab, I highly recommend to download CoRD.
Btw, it’s free 😉

Filed Under: Miscellaneous

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 45
  • Page 46
  • Page 47
  • Page 48
  • Page 49
  • Interim pages omitted …
  • Page 89
  • Go to Next Page »

Copyright © 2026 · SquareOne Theme on Genesis Framework · WordPress · Log in