• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

2 days left to vote

September 19, 2010 by frankdenneman

Eric Siebert owner of vSphere-land.com started the second round of the bi-annual top 25 VMware virtualization blogs voting. The last voting was back in January and this is your chance to vote for your favorite virtualization bloggers and help determining the top 25 blogs of 2010.
This year my blog got nominated for the first time and entered the top 25 (no. 14) and I hope to stay in the top 25 after this voting round. My articles tend to focus primarily on resource management and cover topics such as DRS, CPU and memory scheduler to help you make an informed decision when designing or managing a virtual infrastructure. As noble as this may sound I know that these kinds of topics are not mainstream and I can understand that not everybody is interested to read about these topics week in week out.
Fortunately I’ve managed to get a blog post listed in the Top 5 Planet V12n blog post list at least once every month and referred on a regular basis by sites like Yellow-bricks.com (Duncan Epping), Scott Lowe, NTpro.nl (Eric Sloof) and Chad Sakac and of course many others. So it seems I’m doing something right.
This is my list of top 10 articles I’ve created this year:

  • ESX 4.1 NUMA scheduling
  • vCloud Director Architecture
  • VM to Host Affinity rule
  • Memory Reclaimation, when and how?
  • Reservations and CPU scheduling
  • Resource pools and memory reservations
  • ESX ALUA, TPGS and HP CA
  • Removing an orphaned Nexus DVS
  • Impact of Host local swap on HA and DRS
  • Sizing VMs and NUMA nodes


Closing remarks

This year I’ve got to personally know a lot of bloggers and one thing that amazed me was the time and effort that each of these bloggers put into their work in their spare time. Please take a couple of minutes to vote at vSphere-land whether it’s for me or any of the other bloggers listed and reward them for their hard work.

Filed Under: VMware Tagged With: Top 25 Bloggers, VMware

ESX 4.1 NUMA Scheduling

September 13, 2010 by frankdenneman

VMware has made some changes to the CPU scheduler in ESX 4.1; one of the changes is the support for Wide virtual machines. A wide virtual machine contains more vCPUs than the total amount of available cores in one NUMA node. Wide VM’s will be discussed after a quick rehash of NUMA.

NUMA
NUMA stands for Non-Uniform Memory Access, which translates into a variance of memory access latencies. Both AMD Opteron and Intel Nehalem are NUMA architectures. A processor and memory form a NUMA node. Access to memory within the same NUMA node is considered local access, access to the memory belonging to the other NUMA node is considered remote access.

NUMA Local and Remote Memory
Remote memory access is slower, because the instructions has to traverse a interconnect link which introduces additional hops. Like many other techniques and protocols, more hops equals more latency, therefore keeping remote access to a minimum is key to good performance. (More info about NUMA scheduling in ESX can be found in my previous article “Sizing VM’s and NUMA nodes“.)

If ESX detects its running on a NUMA system, the NUMA load balancer assigns each virtual machine to a NUMA node (home node). Due to assigning soft affinity rules, the memory scheduler preferentially allocates memory for the virtual machine from its home node. In previous versions (ESX 3.5 and 4.0) the complete virtual machine is treated as one NUMA client. But the total amount of vCPUs of a NUMA client cannot exceed the number of CPU cores of a package (physical CPU installed in a socket) and all vCPUs must reside within the NUMA node.

NUMA Client
If the total amount of vCPUs of the virtual machine exceeds the number of cores in the NUMA node, then the virtual machine is not treated as a NUMA client and thus not managed by the NUMA load balancer.

Misallignment NUMA client on NUMA node
Because the VM is not a NUMA client of the NUMA load balancer, no NUMA optimization is being performed by the CPU scheduler. Meaning that the vCPUs can be placed on any CPU core and memory comes from either a single CPU or all CPUs in a round-robin manner. Wide virtual machines tend to be scheduled on all available CPUs.

Spanning VM as NON-NUMA Client
Wide-VMs
The ESX 4.1 CPU scheduler supports wide virtual machines. If the ESX4.1 CPU scheduler detects a virtual machine containing more vCPUs than available cores in one NUMA node, it will split the virtual machine into multiple NUMA clients. At the virtual machine’s power on, the CPU scheduler determines the number of NUMA clients that needs to be created so each client can reside within a NUMA node. Each NUMA client contains as many vCPUs possible that fit inside a NUMA node.

The CPU scheduler ignores Hyper-Threading, it only counts the available number of cores per NUMA node. An 8-way virtual machine running on a four CPU quad core Nehalem system is split into a two NUMA clients. Each NUMA client contains four vCPUs. Although the Nehalem CPU has 8 threads 4 cores plus 4 HT “threads”, the CPU scheduler still splits the virtual machine into multiple NUMA clients.

8 vCPU VM splitting into two NUMA Clients
The advantage of wide VM
The advantage of a wide VM is the improved memory locality, instead of allocating memory pages random from a CPU, memory is allocated from the NUMA nodes the virtual machine is running on.
While reading the excellent whitepaper: “VMware vSphere: The CPU Scheduler in VMware ESX 4.1 VMware vSphere 4.1 whitepaper” one sentence caught my eye:

However, the memory is interleaved across the home nodes of all NUMA clients of the VM.

This means that the NUMA scheduler uses an aggregated memory locality of the VM to the set of NUMA nodes. Call it memory vicinity. The memory scheduler receives a list (called a node mask) of the NUMA node the virtual machine is scheduled on.

NUMA Node Mask
The memory scheduler will preferentially allocate memory for the virtual machine from this set of NUMA nodes, but it can distributed pages across all the nodes within this set. This means that there is a possibility that the CPU from NUMA node 1 uses memory from NUMA node 2.

Initially this looks like no improvement compared to the old situation, but fortunately supporting Wide VM makes a big difference. Wide-VM’s stop large VM’s from scattering all over the CPU’s with having no memory locality at all. Instead of distributing the vCPU’s all over the system, using a node mask of NUMA nodes enables the VMkernel to make better memory allocations decisions for the virtual machines spanning the NUMA nodes.

Filed Under: NUMA Tagged With: NUMA, VMware, Wide-VM

DRS 4.1 Adaptive MaxMovesPerHost

August 27, 2010 by frankdenneman

Another reason to upgrade to vSphere 4.1 is the DRS adaptive MaxMovesPerHost parameter. The MaxMovesPerHost setting determines the maximum amount of migrations per host for DRS load balancing. DRS evaluates a cluster and recommends migrations. By default this evaluation happens every 5 minutes. There are limits to how many migrations DRS will recommend per interval per ESX host because there’s no advantage to recommending so many migrations that they won’t all be completed by the next re-evaluation, by which time demand could have changed anyway.
Be aware that there is no limit on max moves per host for a host entering maintenance or standby mode, but there’s a limit on max moves per host for load balancing. This can (but usually shouldn’t) be changed by setting the DRS Advanced Option “MaxMovesPerHost”. The default value is 8 and is set at the Cluster level. Remember, the MaxMovesPerHost is a cluster setting but configures the maximum migrations from a single host on each DRS invocation. This means you can still see 30 or 40 vMotion operations in the cluster during a DRS invocation.
In ESX/ESXi 4.1, the limit on moves per host will be dynamic, based on how many moves DRS thinks can be completed in one DRS evaluation interval. DRS adapts to the frequency it is invoked (pollPeriodSec, default 300 seconds) and the average migration time observed from previous migrations. In addition DRS follows the new maximum number of concurrent vMotion operations per host over depending on the Network Speed (1GB – 4 vMotions, 10GB – 8 vMotions).
Due to the adaptive nature the algorithm, the name of the setting is quite misleading as it’s no longer a maximum. The “MaxMovesPerHost” parameter will still exist, but its value might be exceeded by DRS. By leveraging the increased amount of concurrent vMotion operations per host and the evaluation of previous migration times DRS is able to rebalance the cluster in a fewer amount of passes. By using fewer amounts of passes, the virtual machines will receive their entitled resources much quicker which should positively affect virtual machine performance.

Filed Under: DRS Tagged With: DRS, MaxMovesPerHost, VMware

DRS-FT integration

July 22, 2010 by frankdenneman

Another new feature of vSphere 4.1 is the DRS-Fault Tolerance integration. vSphere 4.1 allows DRS not only to perform initial placement of the Fault Tolerance (FT) virtual machines, but also migrate the primary and secondary virtual machine during DRS load balancing operations. In vSphere 4.0 DRS is disabled on the FT primary and secondary virtual machines. When FT is enabled on a virtual machine in 4.0, the existing virtual machine becomes the primary virtual machine and is powered-on onto its registered host, the newly spawned virtual machine, called the secondary virtual machine is automatically placed on another host. DRS will refrain from generating load balancing recommendations for both virtual machines.
The new DRS integration removes both the initial placement- and the load-balancing limitation. DRS is able to select the best suitable host for initial placement and generate migration recommendations for the FT virtual machines based on the current workload inside the cluster. This will result in a more load-balanced cluster which likely has positive effect on the performance of the FT virtual machines. In vSphere 4.0 an anti-affinity rule prohibited both the FT primary- and secondary virtual machine to run on the same ESX hosts based on an anti-affinity rule, vSphere 4.1 offers the possibility to create a VM-host affinity rule ensuring that the FT primary and secondary virtual machine do not run on ESX hosts in the same blade chassis if the design requires this. For more information about VM-Host affinity rules please visit this article.
Not only has the DRS-FT integration a positive impact on the performance of the FT enabled virtual machines and arguably all other VMs in the cluster but it will also reduce the impact of FT-enabled virtual machines on the virtual infrastructure. For example, DPM is now able to move the FT virtual machine to other hosts if DPM decides to place the current ESX host in standby mode, in vSphere 4.0, DPM needs to be disabled on at least two ESX host because of the DRS disable limitation which I mentioned in this article.
Because DRS is able to migrate the FT-enabled virtual machines, DRS can evacuate all the virtual machines automatically if the ESX host is placed into maintenance mode. The administrator does not need to manually select an appropriate ESX host and migrate the virtual machines to it, DRS will automatically select a suitable host to run the FT-enabled virtual machines. This reduces the need of both manual operations and creating very “exiting” operational procedures on how to deal with FT-enabled virtual machines during the maintenance window.
DRS FT integration requires having EVC enabled on the cluster. Many companies do not enable EVC on their ESX clusters based on either FUD on performance loss or arguements that they do not intend to expand their clusters with new types of hardware and creating homogenous clusters. The advantages and improvement DRS-FT integration offers on both performance and reduction of complexity in cluster design and operational procedures shed some new light on the discussion to enable EVC in a homogeneous cluster. If EVC is not enabled, vCenter will revert back to vSphere 4.0 behavior and enables the DRS disable setting on the FT virtual machines.

Filed Under: DRS Tagged With: DRS, FT, integration, VMware

Disable DRS and VM-Host rules

July 22, 2010 by frankdenneman

vSphere 4.1 introduces DRS VM-Host Affinity rules and offer two types of rules, mandatory (must run on /must not run on) and preferential (should run on /should nor run on). When creating mandatory rules, all ESX hosts not contained in the specified ESX Host DRS Group are marked as “incompatible” hosts and DRS\VMotion tasks will be rejected if an incompatible ESX Host is selected.
A colleague of mine ran into the problem that mandatory VM-Host affinity rules remain active after disabling DRS; the product team explained the reason why:
By design, mandatory rules are considered very important and it’s believed that the intended user case which is licensing compliance is so important, that VMware decided to apply these restrictions to non-DRS operations in the cluster as well.
If DRS is disabled while mandatory VM-Host rules still exist, mandatory rules are still in effect and the cluster continues to track, report and alert mandatory rules. If a VMotion would violate the mandatory VM-Host affinity rule even after DRS is disabled, the cluster still rejects the VMotion.
Mandatory rules can only be disabled if the administrator explicitly does so. If it the administrator intent to disable DRS, remove mandatory rules first before disabling DRS.

Filed Under: DRS Tagged With: Disable DRS, VM-Host affinity rule, VMware

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 3
  • Page 4
  • Page 5
  • Page 6
  • Page 7
  • Page 8
  • Go to Next Page »

Copyright © 2025 · SquareOne Theme on Genesis Framework · WordPress · Log in