Category: DRS (page 1 of 10)

Resource Pools and Sibling Rivalry

One of the most powerful constructs in the Software Defined Data Center is the resource pool. The resource pool allows you to abstract and isolate cluster compute resources. Unfortunately, it’s mostly misunderstood and it received a bad rep in the past that it cannot get rid off.

One of the challenges of resource pools is to fully commit to resource pools. Placing virtual machines next to resource pools can have an impact of resource distribution. This article zooms in on sibling rivalry.

But before this adventure begins, I would like to stress that the examples provided in the article are a worst-case scenario. In this scenario, all VMs are 100% active. An uncommon situation, but it helps to easily explain the resource distribution. Later in the article, I use a few examples, in which some VMs are active and some are idle. And as you will see, resource pools aren’t that bad after all.

Resource Pool Size
Because resource pool shares are relative to other resource pools or virtual machines with the same parent resource pool, it is important to understand how vCenter sizes resource pools.

The values of CPU and memory shares applied to resource pools are similar to virtual machines. By default, a resource pool is sized like a virtual machine with 4 vCPUs and 16GB of RAM. Depending on the selected share level, a predefined number of shares are issued. Similar to VMs, four share levels can be selected. There are three predefined settings: High, Normal or Low, which specify share values with a 4:2:1 ratio, and the Custom setting, which can be used to specify a different relative relationship.

Share Level Shares of CPU Shares of Memory
Low 2000 81920
Normal 4000 163 840
High 8000 327 680

Caution must be taken when placing VMs at the same hierarchical level as resource pools, as VMs can end up with a higher priority than intended. For example, in vSphere 6.7, the largest virtual machine can be equipped with 128 vCPUs and 6 TB of memory. A 128vCPU and 6TB VM owns 256 000 (128 x 2000) CPU shares and 122 560 000 (6 128 000 x 20) memory shares. Comparing these two results in a CPU ratio is 32:1 and memory 374:1. The previous is an extreme example, but the reality is that 16GB and 4 vCPU VM is not uncommon anymore. Placing such a VM next to a resource pool results in unfair sibling rivalry.

The Family Tree of Resource Consumers
As shares determine the priority of the resource pool or virtual machine relative to its siblings, it is important to determine which objects compete for priority.

In the scenario depicted above, multiple sibling levels are present. VM01 and Resource Pool-1 are child objects of the cluster and therefore are on the same sibling level. VM02 and VM03 are child objects of Resource Pool-1. VM02 and VM03 are siblings, and both compete for resources provided by Resource Pool-1. DRS compares their share values to each other. The share values of VM01 and the other two VMs cannot be compared with each other because they each have different parents and thus do not experience sibling rivalry.

Shares indicate the priority at that particular hierarchical level, but the relative priority of the parent at its level determines the availability of the total amount of resources.

VM01 is a 2-vCPU 8GB virtual machine. The share value of Resource Pool-1 is set to high. As a result, the resource pool owns 8000 shares of CPU. The share value of VM01 is set to Normal and thus it owns 2000 CPU shares. Contention occurs, and the cluster distributes its resources between Resource Pool-1 and VM01. If both VM02 and VM03 are 100% utilized, Resource Pool-1 receives 80% of the cluster resources based on its share value.

Resource Pool-1 divides its resources between VM02 and VM03. Both child-objects own an equal number of shares and therefore receive each 50% of the resources of Resource Pool-1

This 50% of Resource Pool-1 resources equals to 40% of the cluster resources. As for now, both VM02 and VM03 are able to receive more resources than VM-1. However, three additional VMs are placed inside Resource Pool-1. The new VMs own each 2000 CPU shares, increasing the total number of outstanding shares to 10.000.

The distribution at the first level remains the same during contention. The cluster distributes its resources amongst its child-object, VM01 and Resource Pool-1; 20% to VM01 and 80% to Resource Pool-1. Please note this only occurs when all objects are generating 100% utilized.

If VM01 was generating 50% of its load and the VMs in Resource Pool-1 are 100% utilized, the cluster would flow the unused resources to the resource pool to satisfy the demand of its child objects.

The dynamic entitlement is adjusted to the actual demand. The VMs inside RP-1 are equally active, as a result of the reduced activity of VM01, they each receive 2% more resources.

VM02, VM03, and VM04 start to idle. The Resource Pool shifts the entitlement and allocates the cluster resources to the VMs that are active, VM05 and VM06. They each get 50% of 80% of the cluster resources due to their sibling rivalry.

Share Levels are Pre-sets, not Classes
A VM that is placed inside the resource pool, or created in a resource pool, does not inherit the share level of the resource pool. When creating a VM or a resource pool, vCenter assigns the Normal share level by default, independent of the share level of its parent.

Think of share levels as presets of share values. Configure a resource pool or virtual machine with the share-level set to high, and it gets 2000 CPU shares per vCPU. A VM configured with the share level set to low gets 500 CPU shares. If the VM has 4 vCPUs, the VM owns the same number of shares than the 1 vCPU with a share value set to high. Both compete with each other based on share amounts, not based on share level values.

Next Article
This article is a primer for a question about which direction we should take with Resource Pools. This week I shall post a follow up article that zooms in on the possible changes of shares behavior of resource pool in the near future. Stay tuned.

New Fling: DRS Entitlement

I’m proud to announce the latest fling; DRS entitlement. This fling is built by the performance team and it provides insight to the demand and entitlement of the virtual machines and resource pools within a vSphere cluster.



By default, it shows the active CPU and memory consumption, which by itself helps to understand the dynamics within the cluster. Especially when you are using resource pools with different levels of share values. In this example, I have two resource pools, one containing the high-value workloads for the organization, and one resource pool containing virtual machines that are used for test and dev operations. The high-value workloads should receive the resources they require all the time.

The What-If functionality allows you to simulate a few different scenarios. A 100% demand option and a simulation of resource allocation settings. The screenshot below shows the what-if entitlement. What if these workloads generate 100% of activity, what resources do these workloads require if they go to the max?  This allows you to set the appropriate resource allocations settings such as reservations and limits on the resource pools or maybe even on particular virtual machines.

Another option is to specify particular Reservation, Limits, and Shares (RLS) settings to an object. Select the RLS option and select the object you want to use in the simulation.

In this example, I selected the Low Value Workload resource pool and changed the share value setting of the resource pool.



You can verify the new setting before running the analysis. Please note, that this is an analysis, it does not affect the resource allocation of active workload whatsoever. You can simulate different settings and understand the outcome.



Once the correct setting is determined you can apply the setting on the object manually, or you can use the PowerCLI setting and export the PowerCLI one-liner to programmatically change the RLS settings.



Follow the instruction on the flings website to install it on your vCenter.

I would like to thank Sai Inabattini and Adarsh Jagadeeshwaran for creating this fling and for listening to my input!


Virtually Speaking Podcast #67 Resource Management

Two weeks ago Pete Flecha (a.k.a. Pedro Arrow) and John Nicholson invited me to their always awesome podcast to talk about resource management. During our conversation, we covered both on-prem and the features of VMware Cloud on AWS that help cater the needs of your workload.

Being a guest on this podcast is an honour and times flies talking to these two guys. Hope you enjoy it as much as I did.

vSphere 6.5 DRS and Memory Balancing in Non-Overcommitted Clusters

DRS is over a decade old and is still going strong. DRS is aligned with the premise of virtualization, resource sharing and overcommitment of resources. DRS goal is to provide compute resources to the active workload to improve workload consolidation on a minimal compute footprint. However, virtualization surpassed the original principle of workload consolidation to provide unprecedented workload mobility and availability.

With this change of focus, many customers do not overcommit on memory. A lot of customers design their clusters to contain (just) enough memory capacity to ensure all running virtual machines have their memory backed by physical memory. In this scenario, DRS behavior should be adjusted as it traditionally focusses on active memory use.

vSphere 6.5 provides this option in the DRS cluster settings. By ticking the box “Memory Metric for Load Balancing” DRS uses the VM consumed memory for load-balancing operations.

Please note that DRS is focussed on consumed memory, not configured memory! DRS always keeps a close eye on what is happening rather than accepting static configuration. Let’s take a closer look at DRS input metrics of active and consumed memory.

Out-of-the-box DRS Behavior
During load balancing operation, DRS calculates the active memory demand of the virtual machines in the cluster. The active memory represents the working set of the virtual machine, which signifies the number of active pages in RAM. By using the working-set estimation, the memory scheduler determines which of the allocated memory pages are actively used by the virtual machine and which allocated pages are idle. To accommodate a sudden rapid increase of the working set, 25% of idle consumed memory is allowed. Memory demand also includes the virtual machine’s memory overhead.

Let’s use a 16 GB virtual machine as an example of how DRS calculates the memory demand. The guest OS running in this virtual machine has touched 75% of its memory size since it was booted, but only 35% of its memory size is active. This means that the virtual machine has consumed 12288 MB and 5734 MB of this is used as active memory.

As mentioned, DRS accommodate a percentage of the idle consumed memory to be ready for a sudden increase in memory use. To calculate the idle consumed memory, the active memory 5734 MB is subtracted from the consumed memory, 12288 MB, resulting in a total 6554 MB idle consumed memory. By default, DRS includes 25% of the idle consumed memory, i.e. 6554 * 25% = +/- 1639 MB.

The virtual machine has a memory overhead of 90 MB. The memory demand DRS uses in its load balancing calculation is as follows: 5734 MB + 1639 MB + 90 MB = 7463 MB. As a result, DRS selects a host that has 7463 MB available for this machine if it needs to move this virtual machine to improve the load balance of the cluster.

Memory Metric for Load Balancing Enabled
When enabling the option “Memory Metric for Load Balancing” DRS takes into account the consumed memory + the memory overhead for load balancing operations. In essence, DRS uses the metric Active + 100% IdleConsumedMemory.

vSphere 6.5 update 1d UI client allows you to get better visibility in the memory usage of the virtual machines in the cluster. The memory utilization view can be toggled between active memory and consumed memory.

Recently, Adam Eckerle on Twitter published a great article that outlines all the improves of vSphere 6.5 Update 1d. Go check it out. Animated Gif courtesy of Adam.

When reviewing the cluster it shows that the cluster is pretty much balanced.

When looking at the default view of the sum of Virtual Machine memory utilization (active memory). It shows that ESXi host ESXi02 is busier than the others.

However since the active memory of each host is less than 20% and each virtual machine is receiving the memory they are entitled to, DRS will not move virtual machines around. Remember, DRS is designed to create as little overhead as possible. Moving one virtual machine to another host to make the active usage more balanced, is just a waste of compute cycles and network bandwidth. The virtual machines receive what they want to receive now, so why take the risk of moving VMs?

But a different view of the current situation is when you toggle the graph to use consumed memory.

Now we see a bigger difference in consumed memory utilization. Much more than 20% between ESXi02 and the other two hosts. By default DRS in vSphere 6.5 tries to clear a utilization difference of 20% between hosts. This is called Pair-Wise Balancing. However, since DRS is focused on Active memory usage, Pair-Wise Balancing won’t be activated with regards to the 20% difference in consumed memory utilization. After enabling the option “Memory Metric for Load Balancing” DRS rebalances the cluster with the optimal number of migrations (as few as possible) to reduce overhead and risk.

Active versus Consumed Memory Bias
If you design your cluster with no memory overcommitment as guiding principle, I recommend to test out the vSphere 6.5 DRS option “Memory Metric for Load Balancing”. You might want to switch DRS to manual mode, to verify the recommendations first.

KB 2104983 explained: Default behavior of DRS has been changed to make the feature less aggressive

Yesterday a couple of tweets were in my timeline discussing DRS behavior mentioned in KB article 2104983. The article is terse at best, therefor I thought lets discuss this a little bit more in-depth.

During normal behavior DRS uses an upper limit of 100% utilization in its load-balancing algorithm. It will never migrate a virtual machine to a host if that migration results in a host utilization of 100% or more. However this behavior can prolong the time to upgrade all the hosts in the cluster when using the cluster maintenance mode feature in vCenter update manager (parallel remediation).

parallel remediation

To reduce the overall remediation time, vSphere 5.5 contains an increased limit for cluster maintenance mode and uses a default setting of 150%. This can impact the performance of the virtual machine during the cluster upgrade.

vCenter Server 5.5 Update 2d includes a fix that allows users to override the default and can specify the range between 40% and 200%. If no change is made to the setting, the default of 150% is used during cluster maintenance mode.

Please note that normal load balancing behavior in vSphere 5.5 still uses a 100% upper limit for utilization calculation.

Older posts

© 2018

Theme by Anders NorenUp ↑