Resource pools memory reservations

After publishing the article “impact of memory reservations” I received a lot of questions about setting memory reservation at resource pool level. It seems there are several common facts about resource pools and memory reservations that are often misunderstood.

Because reservations are used by the VMkernel\DRS resource schedulers and (HA) admission control, the behavior of reservation can be very confusing. Before memory reservation on resource pool is addressed, let look at which mechanisms uses reservations and when reservations are used.

When are reservations actually used besides admission control?
If a cluster is under-committed the VM resource entitlement will be the same as its demand, in other words, the VM will be allocated whatever it wants to consume within its configured limit.
When a cluster is overcommitted, the cluster experiences more resource demand than its current capacity, at this point DRS and the VMkernel will allocate resources based on the resource entitlement of the virtual machine. Resource entitlement is covered later in the article.

Is there any difference between resource pool level and virtual machine level memory reservation?
To keep it short, VM level reservation can be rather evil, it will hoard memory if it has been used by the virtual machine once. Even if the virtual machine becomes idle, the VMkernel will not reclaim this memory and return it to the free memory set. This means that ESX can start swapping and ballooning if no free memory is available for other virtual machines while the owning VM’s aren’t using their claimed reserved memory. It also has influence on the slot size of High availability, for more information about HA slot sizes, please visit the HA deep dive page at yellow-bricks.com. For more information about virtual machine level memory reservation, please read the article “impact of memory management“.

Behavior of resource pool memory reservation
Now setting a memory reservation on a resource pool level has its own weaknesses, but it is much fairer and more along the whole idea of consolidation and sharing than virtual machine memory reservations. RP level reservations are immediately active, but are not claimed. This means it will only subtract the specified amount of memory from the unreserved capacity of the cluster.
RP reservations are used when children of the resource pool uses memory and the system is under contention. Reservations are not wasted and the resources can be used by other virtual machines. Be aware, using and reserving are two distinct concepts! Virtual machines can use the resource, but they cannot reserve this as well if it is already reserved by another item.

It appears that resource pool memory reservations work almost similar to CPU reservations, they won’t let any resource go to waste. And to top it off, resource pool reservations don’t flow to virtual machines, they will not influence HA slot sizes. Which unfortunately can lead to (temporary) performance loss if a host failover occurs. When a virtual machine is restarted by HA they are not restarted in the correct resource pool but in the root resource pool, which can lead to starvation. Until DRS is invoked, the virtual machine need to do it without any memory reservations.

How to use resource pool memory reservation?
Ok so two popular strategies exist when it comes to setting memory reservation on resource pool levels:
1. CPU and Memory reservations within the resource pool is never overcommitted i.e configured memory all VM’s (40Gb) equals reservation (40GB)
2. Percentage of Cluster resources reserved i.e. memory reservation resource pool (20GB) less than configured memory virtual machines inside RP (40GB)

The process of divvying is rather straightforward if the memory reservation equals the configured memory of the virtual machines inside the resource pool. All pages by the virtual machines are backed by machine pages, the resource entitlement is at least as large as its memory reservation.

What I find more interesting is what happens if the resource pool is configured with a memory reservation that is less than all virtual machine configured memory? DRS will divvy memory reservations based on the virtual machine resource entitlement.

So how is resource entitlement calculated?
A virtual machines resource entitlement is based on various statistics and some estimation techniques. DRS computes a resource entitlement for each virtual machine, based on virtual machine and resource pool configured shares, reservations, and limits settings, as well as the current demands of the virtual machines and resource pools, the memory size, its working set and the degree of current resource contention.

Now by setting a reservation on the resource pool level, the virtual machines who are actively using memory profits the most of this mechanism. Basically if no reservation is set on the VM level, the “RP” reservation is granted to all virtual machines inside the resource pool who are actively using memory.
DRS and the VMkernel calculates the resource pool and the virtual machine share levels. Please read the article “the resource pool priority-pie paradox” to get more information about share levels. and use this to specify the virtual machines priority.

Besides the share level the active utilization (working set) and the configured memory size are both accounted when calculating the resource entitlement. Virtual machines who are idling aren’t competing for resources, so they won’t get any new resources. If the memory is also idle the allocation get adjusted by the idle memory tax. Idle memory tax uses a progressive tax rate, the more idle memory a VM has, the more tax it will generate, this is why the configured memory size is also taken into account. (nice ammo if your customer wants to configure the DHCP server with 64GB memory!)

When we create a “Diva” VM (coined by Craig Risinger), that is setting VM level reservations, this allocation setting is passed to the VMkernel. It will subtract the specified amount of the reservation pool of the RP and it will not share it with others, i.e. the Diva VM is a special creature.

As stated above, RP memory reservations flow more than VM-level reservation, it will not claim\hoard memory. So basically when setting a resource pool reservation, reservations are just a part of the computation of the virtual machines resource entitlement. When the host is overcommitted, the memory usage of the virtual machine is either above or below the resource entitlement. If the memory usage exceeds its resource entitlement, the memory is ballooned or swapped from the virtual machine until it is at or below its entitlement.

Disclosure
Now before you think I fabricated this article all by myself I am happily to admit that I’m in the lucky position to work for VMware and to call some of the world brightest minds my colleagues. Kit Colbert, Carl Waldspurger and Chirag Bhatt took the time and explain this theory very thoroughly to me. Luckily
my colleagues and good friends Duncan Epping and Craig Risinger helped me decipher some out-of-this-world emails from the crew above and participated in some excellent discussions.

Comments

  1. Shan says

    Great Article! Thanks for writing this. I am not seeing the “Mem.minfreepct” attribute in ESX4 version. Is this applicable only in ESX3.5 version? How the memory reclaimation process taken place in vSphere4.x version?

    Thanks,

    Shan

  2. Clark Updike says

    Does this guidance apply even to OvDC resource pools in vSphere? In other words, it appears that if I set the “Memory resources guaranteed” on an OvDC to a particular value, in vSphere it applies the memory reservation directly on the VM–which is against your guidance. So presumably I might set this “Memory resources guaranteed” to zero on the OvDC in vCloud, and then apply something similar to the “popular strategies” on the resource pools for the OvDCs directly in vSphere. The only concern is that you often hear not to change anything in vSphere that is created through vCloud (like the OvDCs). But actually in this case, I don’t get the “This entity is managed by solution vCloud Director”. So maybe it is okay to do this way–but I’d like to hear that confirmed before going down that path. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *