Help my DRS cluster is not load balancing!

Unfortunately I still see this cry for help appearing on the VMTN forums and on twitter. And they usually are accompanied by screenshots like this:

01-DRS-unbalanced-memory

This screen doesn’t really show you if your DRS cluster is balanced or not. It just shows if the virtual machine receives the resources they are entitled to. The reason why I don’t use the word demand is that DRS calculates priority based on virtual machine and resource pool resource settings and resource availability.

To understand if the virtual machine received the resources it requires, hover over the bar and find the virtual machine. A new window is displayed with the metric “Entitled Resources Delivered”

02-DRS-VM-resource-entitlement

DRS attempts providing the resources requested by the virtual machine. If the current host is not able to provide the resources, DRS move it to another host that is able to provide the resources. If the virtual machine is receiving the resources it requires then there is no need to move the virtual machine to another hosts. Moves by DRS consume resources as well and you don’t want to waste resources on unnecessary migrations.

To avoid wasting resources, DRS calculates two metrics, the current host load standard deviation and the target host load standard deviation. These metrics indicate how far the current load of the host is removed from the ideal load. The migration threshold determines how far these two metrics can lie apart before indicating that the distribution of virtual machines needs to be reviewed. The web client contains this cool water level image that indicates the overall cluster balance. It can be found at the cluster summary page and should be used as a default indicator of the cluster resource status.

03-DRS-Host Load Standard Deviation

One of main arguments is that a host contain more than CPU and memory resources alone. Multiple virtual machines located on one host, can stress or saturate the network and storage paths extensively, whereas a better distribution of virtual machine across the hosts would also result in a better distribution of resources at the storage and network path layer. And this is a very valid argument, however DRS is designed to take care of CPU and Memory resource distribution and is therefor unable to take these other resource consumption constraints into account.

In reality DRS takes a lot of metrics into account during its load balance task. For more in-depth information I would recommend to read the article: “DRS and memory balancing in non-overcommitted clusters” and “Disabling mingoodness and costbenefit”.

3 common questions about DRS preferential VM-Host affinity rules

On a regular basis I receive questions about the behavior of DRS when dealing with preferential VM to Host affinity rules. The rules configured with the rule set “should run on / should not run on” are considered preferential. Meaning that DRS prefers to satisfy the requirements of the rules, but is somewhat flexible to run a VM outside the designated hosts. It is this flexibility that raises questions; lets see how “loosely” DRS can operate within the terms of conditions of a preferential rule:

Question 1: If the cluster is imbalanced does DRS migrate the virtual machines out of the DRS host group?
DRS only considers migrating the virtual machines to hosts external to the DRS host group if each host inside the group is 100% utilized. And if the hosts are 100% utilized, then DRS will consider virtual machines that are not part of a VM-Host affinity rule first. DRS will always avoid violating an affinity rule

Question 2: When a virtual machine is powered on, will DRS start the virtual machine on a host external to the DRS host group?
By default DRS will start the virtual machine on hosts listed in the associated Host DRS group. If all hosts are 100% utilized – or – if they do not meet the virtual machine hardware requirements such as datastore or network connectivity, then DRS will start the virtual machine on a host external to the Host DRS group.

Question 3: If a virtual machine is running on a host external to the associated host DRS group, shall DRS try to migrate the virtual machine to a host listed in the DRS host group?
The first action DRS triggers during an invocation is to determine if an affinity rules is violated. If a virtual machine is running on a host external to the associated Host DRS group then DRS will try to correct this violation. This move will have the highest priority ensuring that this move is carried out during this invocation.

Migrating datastore clusters by changing storage profiles in a vCloud

vCloud director 5.1 supports the use of both storage profiles and Storage DRS. One of the coolest features and unfortunately relatively unknown is the ability to live migrate virtual machines between datastore clusters by changing the storage profile in the vCloud director portal.

In my lab I’ve set up a provider vDC that contains two compute clusters. Each compute cluster connects to two datastore clusters. Datastore Cluster “vCloud-SDC-Gold” is compatible with the VM storage profile “vCloud-Gold-Storage”, while Datastore Cluster “vCloud-SDC-Silver” is compatible with the VM storage profile “vCloud-Silver-Storage”.

01

When creating a vApp the default storage profile of the organization vDC is applied to the vApp and all its virtual machines. In this case, the VM storage profile Gold is applied to all the virtual machines in the vApp.

You can determine which VM Storage Profile is associated with the virtual machine by selecting the properties of the virtual machine in the “My Cloud” tab. Please note that vCloud Director does not show the VM Storage Profile at the vApp level!

02

By selecting the drop-down box, all storage profiles that are associated with the organization vCD are displayed.

03

By selecting the Storage Profile “vCloud-Silver-Storage” vCloud Director determines that the virtual machine is stored on a datastore that is not compatible with the associated storage profile. In other words the current configuration is violating the storage level policy.

To correct this violation, vCloud director instructs vSphere to migrate the virtual machine via Storage vMotion to a datastore that is compatible with the VM storage Profile. In this case the datastore cluster “vCloud-DSC-Silver” is selected as the destination. Storage DRS determines the most suitable datastore by using its initial placement algorithm and selects the datastore that has the most amount of free space and the lowest I/O load.

To demonstrate the feature, I selected the virtual machine “W2K8_R2-SP1”. The VM storage profile “vCloud-Gold-Storage” is applied and Storage DRS determined that the datastore “nfs-f-vcloud03” of the datastore cluster “vCloud-DSC-Gold” was the most suitable location.

04

By changing the Storage Profile to “vCloud-Silver-Storage” vCloud director instructed vSphere to migrate it to the datastore cluster that is compatible with the newly associated VM storage profile.

05

When logging into the vCenter server managing the ESXi hosts the following task is running:

06

After the task is complete, vCenter shows that the virtual machine is now stored on datastore “nfs-f-vcloud06″ in the datastore cluster “vCloud-DSC-Silver”.

07

The power of abstraction
The abstraction layer of vCloud Director makes this possible. When changing the storage profile directly on the vSphere layer, nothing happens. vSphere will not migrate the virtual machine to the appropriate datastore cluster that is compatible with the selected VM storage profile.

Useful for stretched clusters?
The reason why I was looking into this feature in my lab is due to an conversation with my esteemed colleagues Lee Dilworth and Aidan Dalgleish. We were looking to an alternative scenario for a stretched cluster. By leveraging the elastic vDC feature of vCloud director, a seperate DRS cluster is created in each site. Due to the automatic initial placement engine on the compute level, we needed to find a construct that can provide us a more deterministic method of virtual machine placement. We immediately thought of the VM profile storage feature. Create two datastore clusters, one per site and associate a profile storage based on site name to the respective datastore clusters.

08

When creating the vApp, just select the site-related Storage Profile to place the virtual machine in a specific site. Due to the compatibility check, vCloud Director determines that in order to be compliant with the storage profile it places the virtual machine on the compute cluster in the same site. For example, if you want to place a virtual machine in site 1, select the VM storage Profile “site 1”. vCloud director determines that the virtual machine needs to be stored in datastore cluster “DSC-Site-1”. The compute cluster Site-1 is the only compute cluster connected to the datastore cluster, therefor both the compute and storage configuration of the virtual machine is stored in Site 1.

This configuration works perfect if you want to simplify initial placement if you have multiple sites/locations and you always want to keep the virtual machine in the same site. However this solution might not be optimal for a Stretched cluster configuration where failover to another site is necessary.

Connectivity to all datastores necessary
As this feature uses storage vMotion instead of cross-host/datastore vMotion, means that the cluster needs to be connected to both datastore clusters.

09
When selecting the different storage profile, the storage state is migrated to another datastore cluster. However it doesn’t move the compute state of the virtual machine. This means that storage is moved to site B, while the compute state is still in Site A. vCloud director does not provide an option to migrate the virtual machine to a different compute cluster within the provider vDC. You can either solve it by logging into the vCenter server that manages the ESXi hosts and manually vMotion the virtual machines to cluster in Site B, or power-off the virtual machine in vCloud Director, then change the storage profile and power-on the virtual machine. Both “solutions” are not very enterprise-level scenario’s therefor I think this is not yet suitable as a stretched cluster configuration

Saving a Resource Pool Structure web client feature not suitable for vCD environments

Last week I published the article “Saving a Resource Pool Structure” describing the RP-tree backup and restore feature of vSphere 5.1 web client. Multiple people immediately asked if the feature keeps the Managed Object Reference ID (MoRef) of the resource pools identical when it restores the resource pool tree? This is important for vCloud Director as it creates a relationship between vCloud Director objects organization vCD and the vSphere level resource pool. vCloud Director ties the org vCD UUID with the vSphere resource pool Moref id within vCD database. For more information read Chris his post: “Gotcha: Disabling VMware DRS with vCloud Director“.

Unfortunately the feature just captures the old tree structure and rebuilds a new tree structure. I tested it by using William Lam’s custom Perl script called moRefFinder.pl. Please visit Williams site to download his script.

1-moref-before

Then I proceeded to backup and restore the resource pool tree. vCenter showed the follow commands being processed.

2-vCenter-operations

Then I checked if the MoRef ID was the same as prior to disabling DRS.

3-moref-after

As shown, the current MoRef ID of the “00-Infra-mgmt” resource pool is 137 contrary to MoRef ID of 129 before disabling DRS.

Therefor you should not use this feature when planning to backup and restore the resource pool used by VCD for its organization vCD structures.

Saving a Resource Pool Structure

During a troubleshooting exercise of a problem with vCenter I needed to disable DRS to make sure DRS was not the culprit. However a resource pool tree exisited in the infrastructure and I was not looking forward reconfiguring all the resource allocation settings again and documenting which VM belonged to which resource pool. The web client of vSphere 5.1 has a cool feature that helps in these cases. When deactivating DRS (Select cluster, Manage, Settings, Edit, deselect “Turn ON vSphere DRS”) the user interface displays the following question:

01-turn-OFF-DRS

Backup resource pool tree
Click “Yes” to backup the tree and select an appropriate destination for the resource pool tree snapshot file. This file uses the name structure clustername.snapshot and should the file size be not bigger than 1 or 2 KB.

Restore resource pool tree
When enabling DRS on the cluster, the User interface does not ask the question to restore the tree. In order to restore the tree, enable DRS first and select the cluster in the tree view. Open the submenu by performing a right-click on the cluster, expand the “All vCenter Actions” and select the option “Restore Resource Pool Tree…”

02-Restore-Resource-Pool-Tree

A window appears and click browse in order to select the saved resource pool tree snapshot and click on OK

03-select-cluster-snapshot

vCenter restores the tree, the resource pool settings (shares, reservations limits) and moves the virtual machines back to the resource pool they were placed in before disabling DRS.

04-Restored-Tree

If you want to save the complete vCenter inventory configuration I suggest you download the fling “InventorySnapshot”.

Update: If you want to use this tool to backup and restore resource pool trees used by vCloud Director, please read this article: Saving a Resource Pool Structure web client feature not suitable for vCD environments