VMware Archives - Page 26 of 29

Node Interleaving: Enable or Disable?

December 28, 2010 by frankdenneman

There seems to be a lot of confusion about this BIOS setting, I receive lots of questions on whether to enable or disable Node interleaving. I guess the term “enable” make people think it some sort of performance enhancement. Unfortunately the opposite is true and it is strongly recommended to keep the default setting and leave Node Interleaving disabled.

Node interleaving option only on NUMA architectures
The node interleaving option exists on servers with a non-uniform memory access (NUMA) system architecture. The Intel Nehalem and AMD Opteron are both NUMA architectures. In a NUMA architecture multiple nodes exists. Each node contains a CPU and memory and is connected via a NUMA interconnect. A pCPU will use its onboard memory controller to access its own “local” memory and connects to the remaining “remote” memory via an interconnect. As a result of the different locations memory can exists, this system experiences “non-uniform” memory access time.

Node interleaving disabled equals NUMA
By using the default setting of Node Interleaving (disabled), the system will build a System Resource Allocation Table (SRAT). ESX uses the SRAT to understand which memory bank is local to a pCPU and tries* to allocate local memory to each vCPU of the virtual machine. By using local memory, the CPU can use its own memory controller and does not have to compete for access to the shared interconnect (bandwidth) and reduce the amount of hops to access memory (latency)

* If the local memory is full, ESX will resort in storing memory on remote memory because this will always be faster than swapping it out to disk.

Node interleaving enabled equals UMA
If Node interleaving is enabled, no SRAT will be built by the system and ESX will be unaware of the underlying physical architecture.

ESX will treat the server as a uniform memory access (UMA) system and perceives the available memory as one contiguous area. Introducing the possibility of storing memory pages in remote memory, forcing the pCPU to transfer data over the NUMA interconnect each time the virtual machine wants to access memory.

By leaving the setting Node Interleaving to disabled, ESX can use System Resource Allocation Table to the select the most optimal placement of memory pages for the virtual machines. Therefore it’s recommended to leave this setting to disabled even when it does sound that you are preventing the system to run more optimally.

Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

Impact of oversized virtual machines part 2

December 17, 2010 by frankdenneman

In part 1 of the series of post on the impact of oversized virtual machines NUMA architecture, memory overhead reservation and share levels are reviewed, part 2 zooms in of the impact of memory overhead reservation and share levels on HA and DRS.
Impact of memory overhead reservation on HA Slot size
The VMware High Availability admission control policy “Host failures cluster tolerates” calculates a slot size to determine the maximum amount of virtual machines active in the cluster without violating failover capacity. This admission control policy determines the HA cluster slot size by calculating the largest CPU reservation, largest memory reservation plus it’s memory overhead reservation. If the virtual machine with the largest reservation (which could be an appropriate sized reservation) is oversized, its memory overhead reservation still can substantial impact the slot size.
The HA admission control policy “Percentage of Cluster Resources Reserved” calculate the memory component of its mechanism by summing the reservation plus the memory overhead of each virtual machine. Therefore allowing the memory overhead reservation to even have a bigger impact on admission control than the calculation done by the “Host Failures cluster tolerates” policy.
DRS initial placement
DRS will use a worst-case scenario during initial placement. Because DRS cannot determine resource demand of the virtual machine that is not running, DRS assumes that both the memory demand and CPU demand is equal to its configured size. By oversizing virtual machines it will decrease the options in finding a suitable host for the virtual machine. If DRS cannot guarantee the full 100% of the resources provisioned for this virtual machine can be used it will vMotion virtual machines away so that it can power on this single virtual machine. In case there are not enough resources available DRS will not allow the virtual machine to be powered on.
Shares and resource pools
When placing a virtual machine inside a resource pool, its shares will be relative to the other virtual machines (and resource pools) inside the pool. Shares are relative to all the other components sharing the same parent; easier way to put it is to call it sibling share level. Therefore the numeric share values are not directly comparable across pools because they are children of different parents.

By default a resource pool is configured with the same share amount equal to a 4 vCPU, 16GB virtual machine. As mentioned in part 1, shares are relative to the configured size of the virtual machine. Implicitly stating that size equals priority.
Now lets take a look again at the image above. The 3 virtual machines are reparented to the cluster root, next to resource pools 1 and 2. Suppose they are all 4 vCPU 16GB machines, their share values are interpreted in the context of the root pool and they will receive the same priority as resource pool 1 and resource pool2. This is not only wrong, but also dangerous in a denial-of-service sense — a virtual machine running on the same level as resource pools can suddenly find itself entitled to nearly all cluster resources.
Because of default share distribution process we always recommend to avoid placing virtual machines on the same level of resource pools. Unfortunately it might happen that a virtual machine is reparented to cluster root level when manually migrating a virtual machine using the GUI. The current workflow defaults to cluster root level instead of using its current resource pool. Because of this it’s recommended to increase the number of shares of the resource pool to reflect its priority level. More info about shares on resource pools can be found in Duncan’s post on yellow-bricks.com.
Go to Part 3: Impact of oversized virtual machine.

Impact of oversized virtual machines part 1

December 16, 2010 by frankdenneman

Recently we had an internal discussion about the overhead an oversized virtual machine generates on the virtual infrastructure. An oversized virtual machine is a virtual machine that consistently uses less capacity than its configured capacity. Many organizations follow vendor recommendations and/or provision virtual machine sized according to the wishes of the customer i.e. more resources equals better performance. By oversizing the virtual machine you can introduce the following overhead or even worse decrease the performance of the virtual machine or other virtual machines inside the cluster.
Note: This article does not focus on large virtual machines that are correctly configured for their workloads.
Memory overhead
Every virtual machine running on an ESX host consumes some memory overhead additional to the current usage of its configured memory. This extra space is needed by ESX for the internal VMkernel data structures like virtual machine frame buffer and mapping table for memory translation, i.e. mapping physical virtual machine memory to machine memory.
The VMkernel will calculate a static overhead of the virtual machine based on the amount of vCPUs and the amount of configured memory. Static overhead is the minimum overhead that is required for the virtual machine startup. DRS and the VMkernel uses this metric for Admission Control and vMotion calculations. If the ESX host is unable to provide the unreserved resources for the memory overhead, the VM will not be powered on, in case of vMotion, if the destination ESX host must be able to back the virtual machine reservation and the static overhead otherwise the vMotion will fail.
The following table displays a list of common static memory overhead encountered in vSphere 4.1. For example, a 4vCPU, 8GB virtual machine will be assigned a memory overhead reservation of 413.91 MB regardless if it will use its configured resources or not.

Memory (MB)	2vCPUs	4vCPUs	8vCPUs
2048	198.20	280.53	484.18
4096	242.51	324.99	561.52
8192	331.12	413.91	716.19
16384	508.34	591.76	1028.07

The VMkernel treats virtual machine overhead reservation the same as VM-level memory reservation and it will not reclaim this memory once it has been used, furthermore memory overhead reservations will not be shared by transparent page sharing.
Shares (size does not translate into priority)
By default each virtual machine will be assigned a specific amount of shares. The amount of shares depends on the share level, low, normal or high and the amount of vCPUs and the amount of memory.

Share Level	Low	Normal	High
Shares per CPU	500	1000	2000
Shares per MB	5	10	20

I.e. a virtual machine configured with 4CPUs and 8GB of memory with normal share level receives 4000 CPU shares and 81960 memory shares. Due to relating amount of shares to the amount of configured resources this “algorithm” indirectly implies that a larger virtual machine needs to receive a higher priority during resource contention. This is not true, as some business critical applications perfectly are run on virtual machines configured with low amounts of resources.
Oversized VMs on NUMA architecture
vSphere 4.1 CPU scheduler has undergone optimization to handle virtual machines which contains more vCPUs than available cores on one NUMA physical CPU. The virtual machine (wide-vm) will be spread across the minimum number of NUMA nodes, but memory locality will be reduced, as memory will be distributed among its home NUMA nodes. This means that a vCPU running on one NUMA node might needs to fetch memory from its other NUMA node. Leading to unnecessary latency, CPU wait states, which can lead to %ready time for other virtual machines in high consolidated environments.
Wide-NUMA nodes are of great use when the virtual machine actually run load comparable to its configured size, it reduces overhead compared to the 3.5/4.0 CPU scheduler, but it still will be better to try to size the virtual machine equal or less than the available cores in a NUMA node.
More information about CPU scheduling and NUMA architectures can be found here:
http://frankdenneman.nl/2010/09/esx-4-1-numa-scheduling/
Go to Part 2: Impact of oversized virtual machine on HA and DRS

Disallowing multiple vm console sessions

November 30, 2010 by frankdenneman

Currently I’m involved in a high-secure virtual infrastructure design and we are required to reduce the number of entry points to the virtual infrastructure. One of the requirements is to allow only a single session to the virtual machine console. Due to the increasing awareness \ demand of security in virtual infrastructure more organizations might want to apply this security setting.
1. Turn of the virtual machine.
2. Open Configuration parameters of the VM to edit the advanced configuration settings
3. Add Remote.Display.maxConnections with a value of 1
4. Power on virtual machine
Update: Arne Fokkema created a Power-CLI function to automate configuring this setting throughout your virtual infrastructure. You can find the power-cli function on ICT-freak.nl.

Provider vDC: cluster or resource pool?

September 24, 2010 by frankdenneman

Duncan’s article on vCloud Allocation models states that:

a provider vDC can be a VMware vSphere Cluster or a Resource Pool …

Although vCloud Director offers the ability to map Provider vDCs to Clusters or Resource Pool, it might be better to choose for the less complex solution. This article zooms in on the compute resource management constructs and particularly on making the choice between assigning a VMware Cluster or a Resource Pool to a Provider vDC and placement of Organization vDCs. I strongly suggest visiting Yellow Bricks to read all vCloud Director posts, these posts explain the new environment / cloud model used by VMware very thoroughly.

Let’s do a quick rehash of these elements before discussing whether to choose between a Cluster or Resource Pool based Provider vDC.

Provider vDC and Organization vDC
In the vCloud a construct named vDCs exist. vDCs stands for Virtual Data Center. Two types of vDCs exists; Provider vDCs and Organization vDCs. A Provider vDC is used to offer a single type of compute resources and a single type of storage resources. This means that Provider vDCs are created for segmenting resources based on resource characteristics (Tiering) or quantity of resources (Capacity). Basically a Provider vDC will function as a SLA construct in the vCloud. At the vSphere layer a VMware vSphere Cluster or Resource Pool can be used to provide the Provider vDC raw Virtual Infrastructure resources. Now the fun part is that using Resource Pools basically contradicts the whole idea behind a Provider vDC, but we will discuss that later.

An Organization vDC (Org vDC) is an allocation out of the Provider vDC (pVDC), in other words the resources provided by the PvDC are consumed by the Org vDC. Organization vDCs inherit the resource types (Tiering\Capacity) from the Provider vDC. At the vSphere level this means that a Resource Pool is created per Org vDC and this will carve out resources from the Provider vDC using the resource allocation settings Reservation, Shares and Limit values for compute resources.

Note: A vDC is not identical to a vSphere Resource Pool, a vDC provides storage additional to compute resources (leveraging resource pools) whether a resource pool only offers compute resources (CPU and Memory). Compute resource management is done at the vSphere level, Storage is enforced and maintained at the vCloud Director level.
vCloud Director uses allocation models to define different usage levels of Reservation and Limits. The Share levels are identical throughout all allocation models and each model uses the normal share level setting.

Allocation Models
Each Organization vCD is configured with an allocation model, three models different types of allocation models exist.

Pay As You Go
Allocation Pool
Reservation Pool

Each allocation model has a unique set of resource allocation settings and each model uses both Resource Pool level and Virtual Machine level resource allocation settings differently. Read the vCD allocation models article on Yellow-Bricks.com.
Note: Reservations on resource pool act differently than reservations on VM-level, for a refresher please read the articles: “Resource Pools memory reservations” and “Impact of memory reservations“. In addition CPU type reservations behave differently from Memory reservations, please read the article “Reservations and CPU scheduling”.
Now let’s visualize the difference between a PvDC aligned with a cluster and a pVDC aligned with a Resource Pool:

Using Resource Pools instead of Clusters
One thing immediately becomes obvious, when using a Resource Pool for providing Compute and Memory resources to the PvDC you share the cluster resources with other PvDCs. One might argue to create only one Resource Pool below Cluster level and create some sort of buffer, but creating a single Resource Pool below cluster level and assigning a PvDC to it will render a certain amount of cluster resources unused. By default, a Resource Pool can claim up to a maximum of 94% of its parent Resource Pool.

By using multiple Provider vDCs in one cluster you abandon the idea of segmenting resources based on resource characteristics and quantity (Tiering and Capacity). Because a Resource Pool spans the entire cluster the PvDCs will schedule the virtual machine on every host available in the cluster. By using the Resource Pool model it introduces a whole new complex resource management construct all by itself. Let’s focus on the challenges this model will introduce:

Resource Pool creation
When creating a Provider vDC, a Cluster or Resource Pool must be selected, this means the Resource Pool must be manually configured before creating and mapping the Provider vDC to the Resource Pool. During the creation of this Resource pool, the admin must specify the resource allocation settings. The Reservation, Shares and Limit settings of a Resource Pool are not changed dynamically when adding additional ESX hosts to the cluster. The admin must change (increase) the reservation and Limit setting each time new hosts are added to the cluster.

The second drawback of the RP model is sizing. Because multiple Provider vDC Resource Pools will exists beneath the Root Resource Pool (Cluster) level the admin/architect needs to calculate a proper resource allocation ratio for the existing Provider vDCs.
Mapping a Provider vDC to a Resource pool result in manually recalculation the resource allocation settings each time a new tenant is introduced and when the new Org vDC joins the Provider vDC.

Sibling Share Level
If “Pay as You go” or “Allocation Pool” models are used, some resources might be provided via a “burstability” model. When creating an Organization vDC, a guaranteed amount of resources must be specified as well as an upper limit known as an “Allocation”. The difference between the total allocated resources and the specified guaranteed resources is a pool of resources that are available to that Organization vDC, however, it is important to note that those resources are not certain to be available at any given point in time. This is called the burstability space.

These “burstable” resources are allocated based on Shares in times of contention. Shares specify the priority for the virtual machine or Resource Pool relative to other Resource Pools and/or virtual machines with the same parent in the resource hierarchy. The key point is that shares values can be compared directly only among siblings. This means that each Provider vDC is the sibling of another Provider vDC in the cluster and they will receive resources from its parent Resource Pool (Root Resource Pool) based on their Resource Entitlement. That means that this model:

Translates into this model:

Resource Entitlement
Resource Pool and virtual machine resource entitlements are based on various statistics and some estimation techniques. DRS computes a resource entitlement for each virtual machine, based on virtual machine and Resource Pool configured shares, reservations, and limits settings, as well as the current demands of the virtual machines and Resource Pools, the memory size, its working set and the degree of current resource contention.

As mentioned before, this burstable space is allocated based on the amount of shares and the active utilization (working set) when calculating the resource entitlement. Virtual machines who are idling aren’t competing for resources, so they won’t get any new resources assigned and therefore the Provider vDC will not demand it from the Root Resource Pool. Be aware that the resource entitlement is calculated at host level scheduling (VMkernel) and Global scheduling (DRS). DRS will create a pack (lump sum) of resources and divide this across the Resource pools and its children. This lump sum is recalculated every 5 minutes.

Introducing an additional layer of Provider vDC Resource Pools between the cluster and the Organization vDC Resource Pools will not only complicate the resource entitlement calculation but will also create additional unnecessary overhead on DRS. Besides the 300 second invocation period, DRS also gets invocated each time when a virtual machine is powered-off, when a resource setting of a virtual machine or Resource Pool is changed or when a Resource Pool or a virtual machine is moved in or out the Resource Pool hierarchy. This is the reason why the Resource Pool tree must be as “flat” as possible; having additional layers will complicate the resource calculation and distribution.

If you decide to map a Provider vDC to a Resource Pool is recommended allocating the amount of CPU and Memory resources of the pVDC Resource Pool identical to the combined amount of resources allocated to the Org vDCs. By accumulating all Org vDC allocation settings and setting the reservation on the Provider vDC equal to the result of that sum removes the burstable space on PvDC level. Only siblings inside the Provider vDC will have to compete for resources during contention.

Placement of Organization vDCs in Provider vDCs
Proper Resource management is very complicated in a Virtual Infrastructure or vCloud environment. Each allocation models uses a different combination of resource allocation settings on both Resource Pool and Virtual Machine level, therefore introducing different types of resource entitlement behavior. Mixing Allocation models inside a Provider vDC makes capacity management and capacity planning a true nightmare. It is advised to create a Provider vDC per Allocation Model. This means that (preferential) a Provider vDC is mapped to a Cluster and this cluster will host only “Pay As You Go”, “Allocation Pool” or “Reservation Pool” type Organization vDCs.

Words of advice
Using different allocation models within a Provider vDC can be a challenge to create a proper level of utilization and flexibility all by itself. Using Resource Pools to act as the compute Resource Pool construct for Provider vDCs makes it in my opinion incredibly complex. Using Resource Pools instead of Clusters deviates from the intention Provider vDCs are created (segmenting Tiering and Capacity). Although it’s possible to map Provider vDC to a Resource Pool it is wiser to map Provider vDCs to Cluster levels only.

Avoid using different types of allocation models within a Provider vDC, mixing allocation models makes proper capacity management and capacity planning unnecessary difficult.

Best practice:
Map Provider vDC to a VMware vSphere Cluster.
Usage of same type of Allocation model type Organization vDC inside a Provider vDC.