In part 1 of the series of post on the impact of oversized virtual machines NUMA architecture, memory overhead reservation and share levels are reviewed, part 2 zooms in of the impact of memory overhead reservation and share levels on HA and DRS.
Impact of memory overhead reservation on HA Slot size
The VMware High Availability admission control policy “Host failures cluster tolerates” calculates a slot size to determine the maximum amount of virtual machines active in the cluster without violating failover capacity. This admission control policy determines the HA cluster slot size by calculating the largest CPU reservation, largest memory reservation plus it’s memory overhead reservation. If the virtual machine with the largest reservation (which could be an appropriate sized reservation) is oversized, its memory overhead reservation still can substantial impact the slot size.
The HA admission control policy “Percentage of Cluster Resources Reserved” calculate the memory component of its mechanism by summing the reservation plus the memory overhead of each virtual machine. Therefore allowing the memory overhead reservation to even have a bigger impact on admission control than the calculation done by the “Host Failures cluster tolerates” policy.
DRS initial placement
DRS will use a worst-case scenario during initial placement. Because DRS cannot determine resource demand of the virtual machine that is not running, DRS assumes that both the memory demand and CPU demand is equal to its configured size. By oversizing virtual machines it will decrease the options in finding a suitable host for the virtual machine. If DRS cannot guarantee the full 100% of the resources provisioned for this virtual machine can be used it will vMotion virtual machines away so that it can power on this single virtual machine. In case there are not enough resources available DRS will not allow the virtual machine to be powered on.
Shares and resource pools
When placing a virtual machine inside a resource pool, its shares will be relative to the other virtual machines (and resource pools) inside the pool. Shares are relative to all the other components sharing the same parent; easier way to put it is to call it sibling share level. Therefore the numeric share values are not directly comparable across pools because they are children of different parents.

By default a resource pool is configured with the same share amount equal to a 4 vCPU, 16GB virtual machine. As mentioned in part 1, shares are relative to the configured size of the virtual machine. Implicitly stating that size equals priority.
Now lets take a look again at the image above. The 3 virtual machines are reparented to the cluster root, next to resource pools 1 and 2. Suppose they are all 4 vCPU 16GB machines, their share values are interpreted in the context of the root pool and they will receive the same priority as resource pool 1 and resource pool2. This is not only wrong, but also dangerous in a denial-of-service sense — a virtual machine running on the same level as resource pools can suddenly find itself entitled to nearly all cluster resources.
Because of default share distribution process we always recommend to avoid placing virtual machines on the same level of resource pools. Unfortunately it might happen that a virtual machine is reparented to cluster root level when manually migrating a virtual machine using the GUI. The current workflow defaults to cluster root level instead of using its current resource pool. Because of this it’s recommended to increase the number of shares of the resource pool to reflect its priority level. More info about shares on resource pools can be found in Duncan’s post on yellow-bricks.com.
Go to Part 3: Impact of oversized virtual machine.
Impact of oversized virtual machines part 1
Recently we had an internal discussion about the overhead an oversized virtual machine generates on the virtual infrastructure. An oversized virtual machine is a virtual machine that consistently uses less capacity than its configured capacity. Many organizations follow vendor recommendations and/or provision virtual machine sized according to the wishes of the customer i.e. more resources equals better performance. By oversizing the virtual machine you can introduce the following overhead or even worse decrease the performance of the virtual machine or other virtual machines inside the cluster.
Note: This article does not focus on large virtual machines that are correctly configured for their workloads.
Memory overhead
Every virtual machine running on an ESX host consumes some memory overhead additional to the current usage of its configured memory. This extra space is needed by ESX for the internal VMkernel data structures like virtual machine frame buffer and mapping table for memory translation, i.e. mapping physical virtual machine memory to machine memory.
The VMkernel will calculate a static overhead of the virtual machine based on the amount of vCPUs and the amount of configured memory. Static overhead is the minimum overhead that is required for the virtual machine startup. DRS and the VMkernel uses this metric for Admission Control and vMotion calculations. If the ESX host is unable to provide the unreserved resources for the memory overhead, the VM will not be powered on, in case of vMotion, if the destination ESX host must be able to back the virtual machine reservation and the static overhead otherwise the vMotion will fail.
The following table displays a list of common static memory overhead encountered in vSphere 4.1. For example, a 4vCPU, 8GB virtual machine will be assigned a memory overhead reservation of 413.91 MB regardless if it will use its configured resources or not.
| Memory (MB) | 2vCPUs | 4vCPUs | 8vCPUs |
| 2048 | 198.20 | 280.53 | 484.18 |
| 4096 | 242.51 | 324.99 | 561.52 |
| 8192 | 331.12 | 413.91 | 716.19 |
| 16384 | 508.34 | 591.76 | 1028.07 |
The VMkernel treats virtual machine overhead reservation the same as VM-level memory reservation and it will not reclaim this memory once it has been used, furthermore memory overhead reservations will not be shared by transparent page sharing.
Shares (size does not translate into priority)
By default each virtual machine will be assigned a specific amount of shares. The amount of shares depends on the share level, low, normal or high and the amount of vCPUs and the amount of memory.
| Share Level | Low | Normal | High |
| Shares per CPU | 500 | 1000 | 2000 |
| Shares per MB | 5 | 10 | 20 |
I.e. a virtual machine configured with 4CPUs and 8GB of memory with normal share level receives 4000 CPU shares and 81960 memory shares. Due to relating amount of shares to the amount of configured resources this “algorithm” indirectly implies that a larger virtual machine needs to receive a higher priority during resource contention. This is not true, as some business critical applications perfectly are run on virtual machines configured with low amounts of resources.
Oversized VMs on NUMA architecture
vSphere 4.1 CPU scheduler has undergone optimization to handle virtual machines which contains more vCPUs than available cores on one NUMA physical CPU. The virtual machine (wide-vm) will be spread across the minimum number of NUMA nodes, but memory locality will be reduced, as memory will be distributed among its home NUMA nodes. This means that a vCPU running on one NUMA node might needs to fetch memory from its other NUMA node. Leading to unnecessary latency, CPU wait states, which can lead to %ready time for other virtual machines in high consolidated environments.
Wide-NUMA nodes are of great use when the virtual machine actually run load comparable to its configured size, it reduces overhead compared to the 3.5/4.0 CPU scheduler, but it still will be better to try to size the virtual machine equal or less than the available cores in a NUMA node.
More information about CPU scheduling and NUMA architectures can be found here:
http://frankdenneman.nl/2010/09/esx-4-1-numa-scheduling/
Go to Part 2: Impact of oversized virtual machine on HA and DRS
Enhanced vMotion Compatibility
Enhanced vMotion Compatibility (EVC) is available for a while now, but it seems to be slowly adopted. Recently VMguru.nl featured an article “Challenge: vCenter, EVC and dvSwitches” which illustrates another case where the customer did not enable EVC when creating the cluster. There seem to be a lot of misunderstanding about EVC and the impact it has on the cluster when enabled.
What is EVC?
VMware Enhanced VMotion Compatibility (EVC) facilitates VMotion between different CPU generations through use of Intel Flex Migration and AMD-V Extended Migration technologies. When enabled for a cluster, EVC ensures that all CPUs within the cluster are VMotion compatible.
What is the benefit of EVC?
Because EVC allows you to migrate virtual machines between different generations of CPUs, with EVC you can mix older and newer server generations in the same cluster and be able to migrate virtual machines with VMotion between these hosts. This makes adding new hardware into your existing infrastructure easier and helps extend the value of your existing hosts.
EVC forces newer processors to behave like old processors
Well, this is not entirely true; EVC creates a baseline that allows all the hosts in the cluster that advertises the same feature set. The EVC baseline does not disable the features, but indicates that a specific feature is not available to the virtual machine.
Now it is crucial to understand that EVC only focuses on CPU features, such as SSE or AMD-now instructions and not on CPU speed or cache levels. Hardware virtualization optimization features such as Intel VT-Flexmigration or AMD-V Extended Migration and Memory Management Unit virtualization such as Intel EPT or AMD RVI will still be available to the VMkernel even if EVC is enabled. As mentioned before EVC only focuses of the availability of features and instructions of the existing CPUs in the cluster. For example features like SIMD instructions such as the SSE instruction set.
Let’s take a closer look, when selecting an EVC baseline, it will apply a baseline feature set of the selected CPU generation and will expose specific features. If an ESX host joins the cluster, only those CPU instructions that are new and unique to that specific CPU generation are hidden from the virtual machines. For example; if the cluster is configured with an Intel Xeon Core i7 baseline, it will make the standard Intel Xeon Core 2 feature plus SSE4.1., SSE4.2, Popcount and RDTSCP features available to all the virtual machines, when an ESX host with a Westmere (32nm) CPU joins the cluster, the additional CPU instruction sets like AES/AESNI and PCLMULQDQ are suppressed.

As mentioned in the various VMware KB articles, it is possible, but unlikely, that an application running in a virtual machine would benefit from these features, and that the application performance would be lower as the result of using an EVC mode that does not include the features.
DRS-FT integration and building block approach
When EVC is enabled in vSphere 4.1, DRS is able to select an appropriate ESX host for placing FT-enabled virtual machines and is able to load-balance these virtual machines, resulting in a more load-balanced cluster which likely has positive effect on the performance of the virtual machines. More info can be found in the article “DRS-FT integration”.
Equally interesting is the building block approach, by enabling EVC, architects can use predefined set of hosts and resources and gradually expand the ESX clusters. Not every company buys computer power per truckload, by enabling EVC clusters can grow clusters by adding ESX host with new(er) processor versions.
One potential caveat is mixing hardware of different major generations in the same cluster, as Irfan Ahmad so eloquently put it “not all MHz are created equal”. Meaning that newer major generations offer better performance per CPU clock cycle, creating a situation where a virtual machine is getting 500 MHz on a ESX host and when migrated to another ESX host where that 500 MHz is equivalent to 300 MHz of the original machine in terms of application visible performance. This increases the complexity of troubleshooting performance problems.
Recommendations?
No performance loss will be likely when enabling EVC. By enabling EVC, DRS-FT integration will be supported and organizations will be more flexible with expanding clusters over longer periods of time, therefor recommending enabling EVC on clusters. But will it be a panacea to stream of new major CPU generation releases? Unfortunately not! A possibility is to treat the newest hardware (Major releases) as a higher service as the older hardware and because of this create new clusters
European distributor for HA and DRS book
As of today, our book “vSphere 4.1 HA and DRS Technical Deepdive” can be ordered via ComputerCollectief. Computercollectief is a dutch computer book and software reseller and ships to most European countries. Using Computercollectief, we hope to evade the long shipping times and accompanying costs.
Go check it out. http://www.comcol.nl/detail/73133.htm
Comcol expect to be able to deliver at the end of this month.
Dutch vBeers
Simon Long of The SLOG is introducing vBeers to Holland. I’ve copied the text from his vBeers blog article.
Every month Simon Seagrave and I try organise a social get together of like-minded Virtualization enthusiasts held in a pub in central London (and Amsterdam). We like to call it vBeers. Before I go on, I would just like to state, although it’s called vBeers, you do NOT have to drink beer or any other alcohol for that matter. This isn’t just an excuse to get blind drunk.
We came up with idea whilst on the Gestalt IT Tech Field Day back in April. We were chatting and we both recognised that we don’t get together enough to catch-up, mostly do to busy work schedules and private lives. We felt that if we had a set date each month, the likely hood of us actually making that date would be higher than previous attempts. So the idea of vBeers was born.
The first Amsterdam vBeers will be held on Thursday 16th of December starting at 6:30pm in ‘Herengracht Cafe’ which is placed close to Leidseplein and Dam Square. This venue serves a fine of selection of beers along with soft drinks and bar food.
Drinks will not be paid for, there will not be a tab. When you buy a drink please pay for it as no one else will be paying for your drinks.
* Location: The ‘Herengracht Cafe‘ Amsterdam
* Address: Herengracht 435, Herengracht/Leidsestraat
* Nearest Tram Station: Koningsplein – Lijn 1,2,5
* Time: 6:30pm
* Location: Map
