Frank Denneman - Chief Technologist AI at VMware

vSphere 5.1 web client: VM overrides -Storage DRS automation level overview

January 9, 2013 by frankdenneman

Overall the vSphere 5.1 web client attempts to mimic the behavior of menus and settings workflows of the (old) vSphere client. When editing the settings of a datastore cluster, the web client provides the same set of options that can be edited as the vSphere client. However certain functions of overviews and menus are changed in the vSphere 5.1 web client. For example the VM overrides screen. The primary purpose of the VM overrides screen is to display deviant Storage DRS Automation level of the virtual machines inside the datastore cluster.
VM overrides and Virtual Machine Settings screens
The VM overrides screen is located in the storage view, select the datastore cluster, select the tab Manage and click on the Settings button.

The VM overrides screen is the replacement of the virtual machine settings screen of the datastore cluster settings in the vSphere client.

Difference in default behavior
As you might have noticed, the web client is not listing any virtual machine while the Virtual Machine settings overview in the vSphere client shows 5 virtual machines and a VM template. Already mentioned in the introduction paragraph, the primary purpose of the VM overrides screen has changed from the Virtual Machine settings overview in the vSphere client. The VM Overrides screen only displays a virtual machine is set to a non-default automation level.
To display the different behavior, I have change the Automation level of VM3, VM4, VM5. The datastore cluster is configured with a Manual Automation Mode. Therefor the default automation mode is Default (Manual). The previous screenshot shows that all virtual machines are configured with the Default (Manual) automation level, VM3 is changed to Fully Automated, VM4 to Manual and VM5 to disabled. If you want to reproduce this behavior in your own environment, change the automation level in the vSphere client and then go the VM overrides screen in the web client to see the modified virtual machines listed.

The VM overrides screen displays the following:

Even though VM4 is configured with the same automation level as the datastore cluster, the VM overrides screen displays VM4 as it is not configured with the default automation mode. By changing the automation mode back to Default (Manual) via the Edit screen, VM4 is removed from the VM overrides list.

To be honest it took me a while to get used to the new functionality of this screen. I would like to know if you like this new behavior or if you rather prefer the way the virtual machine settings view in the old vSphere client works?

Manual storage vMotion migrations into a datastore cluster

January 8, 2013 by frankdenneman

Frequently I receive questions about the impact of a manual migration into a datastore cluster, especially about the impact of the VM disk file layout. Will Storage DRS take the initial disk layout into account or will it be changed? The short answer is that the virtual machine disk layout will be changed by the default affinity rule configured on the datastore cluster. The article describes several scenarios of migrating “distributed“ and “centralized” disk layout configurations into datastore cluster configured with different affinity rules.
Test scenario architecture
For the test scenarios I’ve build two virtual machines VM1 and VM2 Both virtual machines are of identical VM configuration, only the datastore location is different. VM1-centralized has a “centralized” configuration, storing all VMDKs on a single datastore, while VM2-distributed has a “distributed” configuration, storing all VMDKs on separate datastores.

Hard disk	Size	VM 1 datastore	VM 2 datastore
Working directory	8GB	FD-X4	FD-X4
Hard disk 1	60GB	FD-X4	FD-X4
Hard disk 2	30GB	FD-X4	FD-X5
Hard disk 1	10GB	FD-X4	FD-X6

Two datastore clusters exists in the virtual infrastructure:

Datastore cluster	Default Affinity rule	VMDK rule applied on VM
Tier-1 VMs and VMDKs	Do not keep VMDKs together	Intra-VM Anti-affinity
Tier-2 VMs and VMDKs	Keep VMDKs together	Intra-VM Affinity rule

Test 1: VM1-centralized to Datastore Cluster Tier-2 VMs and VMDKs
Since the virtual machine is stored on a single datastore is makes sense to start of migrating the virtual machine to the datastore cluster which applies a VMDK affinity rule, keeping the virtual machine disk files together on a single datastore in the datastore cluster.Select the virtual machine, right click the virtual machine to display the submenu and select the option “Migrate…”. The first step is to select the migration type, select change datastore.
00-UI-select-migration type
The second step is to select the destination datastore, as we are planning to migrate the virtual machine to a datastore cluster it is necessary to select the datastore cluster object.
01-UI-select-Datastore
After clicking next, the user interface displays the Review Selection screen; notice that the datastore cluster applied the default cluster affinity rule.
02-UI-review-selections
Storage DRS has evaluated the current load of the datastore cluster and the configuration of the virtual machine, it concludes that datastore nfs-f-05 is the best fit for the virtual machine, the existing virtual machines in the datastore cluster and the load balance state of the cluster. By clicking “more recommendations” other datastore destinations are presented.
Test result: Intra-VM affinity rule applied and all virtual machine disk files are stored on a single datastore
Selecting the Datastore cluster object
The user interface provides you two options, select the datastore cluster object or a datastore that is part of the datastore cluster, however for that option you explicitly need to disable Storage DRS for this virtual machine. By selecting the datastore cluster, you fully leverage the strength of Storage DRS. Storage DRS initiates it’s algorithms and evaluate the current state of the datastore cluster. It reviews the configuration of the new virtual machine and is aware of the I/O load of each datastore as well as the space utilization. Storage DRS weigh both metrics and will weigh either space of I/O load heavier if the utilization is higher.
Disable Storage DRS for this virtual machine
By default it’s not possible to select a specific datastore that is a part of a datastore cluster during the second step “Select Datastore”. In order to do that, one must activate (tick the option box) the “Disable Storage for this virtual machine”. By doing so the datastores in the lower part of the screen are available for selection. However this means that the virtual machine will be disabled for any Storage DRS load balancing operation. Not only will it affect have an effect for the virtual machine itself, it also impacts other Storage DRS operations such as Maintenance Mode and Datastore Cluster defragmentation. As Storage DRS is not allowed to move the virtual machine, it cannot migrate the virtual machine to find an optimum load balance state when Storage DRS needs to make room for an incoming virtual machine. For more information about cluster defragmentation, read the following article: Storage DRS initial placement and datastore cluster defragmentation.
Test 2: VM1-centralized to Datastore Cluster Tier-1 VMs and VMDKs
Migrating a virtual machine stored on a single datastore to a datastore cluster with anti-affinity rules enabled results in a distribution of the virtual machine disk files:
04-VM1-DSC-Tier-1
Test result: Intra-VM anti-affinity rule applied and the virtual machine disk files are placed on separate datastores.
Working directory and default anti-affinity rules
Please note that in the previous scenario the configuration file (working directory) is placed on the same datastore as Hard disk 3. Storage DRS does not forcefully attempt to place the working directory on a different datastore. It weighs the load balance state of the cluster heavier than separation from the virtual machine VMDK files.
Test 3: VM2-distributed to Datastore Cluster Tier-1 VMs and VMDKs
Following the example of VM1, I started off by migrating VM2-Distributed to Tier-1 as the datastore cluster is configured to mimic the initial state of the virtual machine and that is to distributed the virtual machine across as many datastores as possible. After selecting Datastore Cluster Tier-1 VM and VMDKs, Storage DRS provided the following recommendation:
05-vm2-dsc-tier-1
Test result: Intra-VM anti-affinity rule applied on VM and the virtual machine disk files are stored on separate datastores.
A nice tidbit, as every virtual disk file is migrated between two distinct datastores, this scenario leverages the new functionality of parallel disk migration introduced in vSphere 5.1.
Test 4: VM2-distributed to Datastore Cluster Tier-2 VMs and VMDKs
What happens if you migrate a distributed virtual machine to a datastore cluster configured with a default affinity rule? Selecting Datastore Cluster Tier-2 VM and VMDKs, Storage DRS provided the following recommendation:
06-distributed-affinity rule
Test result: Intra-VM affinity rule applied on VM and the virtual machines are placed on a single datastore cluster.
Test 5: VM2-distributed to Multiple Datastore clusters
A common use case is to distribute a virtual machine across multiple tiers of storage to provide performance while taken economics into account. This test simulates the exercise of placing the working directory and guest OS disk (Hard disk 1) on datastore cluster Tier 2 and the database and logging hard disk (Hard disk 2 and Hard disk 3) on datastore cluster Tier 1.
In order to configure the virtual machine to use multiple datastores, click on the button Advanced during the second step of the migration:
07-advanced
This screen shows the current configuration, by selecting the current datastore of a hard disk a browse menu appears:
09-distributed-configuration
Select the appropriate datastore cluster for each hard disk and click on next to receive the destination datastore recommendation from Storage DRS.
The working directory of the VM and Hard disk 1 are stored on datastore cluster Tier 2 and Hard disk 2 and Hard disk 3 are stored in datastore cluster Tier 1.
10-multipe-datastore-clusters
As datastore cluster Tier 2 is configured to keep the virtual machine files together, both the working directory (designated as Configuration file in the UI) and Hard disk 1 are placed on datastore nfs-f-05. A default anti-affinity rule is applied to all new virtual machines in datastore cluster 2, therefore Storage DRS recommends to place Hard disk 2 on nfs-f-07 and Hard disk 3 on datastore nfs-f-01.
Test result: Intra-VM anti-affinity rule applied on VM. The files stored in Tier-2 are placed on a single datastore, while the virtual machine disk files stored in the Tier-1 datastore are located on different datastores.

Initial VM configuration	Cluster default affinity rule	Result	Configured on:
Centralized	Affinity rule	Centralized	Entire VM
Centralized	Anti0Affinity rule	Distributed	Entire VM
Distributed	Anti-Affinity rule	Distributed	Entire VM
Distributed	Affinity rule	Centralized	Entire VM
Distributed	Affinity rule	Centralized	Working directory + Hard disk 1
	Anti-Affinity rule	Distributed	Hard disk 2 and Hard disk 3

All types of migrations with the UI lead to a successful integration with the datastore cluster. Every migration results in an application of the correct affinity or anti-affinity rule set by the default affinity rule of the cluster.

Storage DRS and Storage vMotion bugs solved in vSphere 5.0 Update 2.

December 21, 2012 by frankdenneman

Today Update 2 for vSphere ESXI 5.0 and vCenter Server 5.0 were released. I would like to highlight two bugs that have been fixed in this update, one for Storage DRS and one for Storage vMotion
Storage DRS
vSphere ESXi 5.0 Update 2 was released today and it contains a fix that should be interesting to customers running Storage DRS on vSphere 5.0. The release note states the following bug:

Adding a new hard disk to a virtual machine that resides on a Storage DRS enabled datastore cluster might result in Insufficient Disk Space error
When you add a virtual disk to a virtual machine that resides on a Storage DRS enabled datastore and if the size of the virtual disk is greater than the free space available in the datastore, SDRS might migrate another virtual machine out of the datastore to allow sufficient free space for adding the virtual disk. Storage vMotion operation completes but the subsequent addition of virtual disk to the virtual machine might fail and an error message similar to the following might be displayed:
Insufficient Disk Space

In essence Storage DRS made room for the incoming virtual machine, but failed to place the new virtual machine. This update fixes a bug in the datastore cluster defragmentation process. For more information about datastore cluster defragmentation read the article: Storage DRS initial placement and datastore cluster defragmentation.
Storage vMotion
vCenter Server 5.0 Update 2 contains a fix that allows you to rename your virtual machine files with a Storage vMotion.

vSphere 5 Storage vMotion is unable to rename virtual machine files on completing migration
In vCenter Server , when you rename a virtual machine in the vSphere Client, the vmdk disks are not renamed following a successful Storage vMotion task. When you perform a Storage vMotion of the virtual machine to have its folder and associated files renamed to match the new name. The virtual machine folder name changes, but the virtual machine file names do not change.

Duncan and I knew how many customers where relying on this feature for operational processes and pushed heavily to get it back in. We are very pleased to announce it’s back in vSphere 5.0, unfortunately this fix is not available in 5.1 yet!
For more info about the fixes in the updates please review the release notes:
ESXi 5.0 : https://www.vmware.com/support/vsphere5/doc/vsp_esxi50_u2_rel_notes.html
vCenter 5.0: https://www.vmware.com/support/vsphere5/doc/vsp_vc50_u2_rel_notes.html

Multi-NIC vMotion – failover order configuration

December 20, 2012 by frankdenneman

After posting the article “designing your vMotion network” I quickly received the question which failover order configuration is better. Is it better to configure the redundant NIC(s) as standby or as unused? The short answer: always use standby and never unused!
Tomas Fojta posted the comment that it does not make sense to place the NICs into standby mode:

In the scenario as depicted in the diagram I prefer to use active/unused. If you think about it the standby option does not give you anything as when one of the NICs fails both vmknics will be on the same NIC which does not give you anything.

Although it does not provide any performance benefits having vMotion routing the traffic across the same physical NIC during a NIC failure, there are two important reasons for providing redundant connection to each vmknic:

Abstraction layer
Using a default interface for management traffic

Abstraction layer
As mentioned in the previous article, vMotion operations are done at the vmknic level. Due to vMotion focussing on the vmknic layer instead of the physical layer, some details are abstracted from vMotion load balancing logic such as the physical NIC’s link state or health. Due to this abstraction vMotion just selects the appopriate vmknics for load balancing network packets and trusts that there is connectivity between the vmknics of the source and destination host.

Default interface
Although vMotion is able to use multiple vmknics to load balance traffic, vMotion assigns one vmknic as its default interface and prefers to use this for connection management and some trivial management transmissions. *
As such, if you’ve got multiple physical NICs on a host that you plan to use for vMotion traffic, it makes sense to mark them as standby NICs for the other vMotion vmknics on the host. That way, even if you lose a physical NIC, you won’t see vMotion network connectivity issues.
This means that if you have designated three physical NICs for vMotion your vmknic configuration should look as follows:

VMknic	Active NIC	Standby NIC
vmknic0	NIC1	NIC2, NIC3
vmknic1	NIC2	NIC1, NIC3
vmknic2	NIC3	NIC1, NIC2

By placing the redundant NICs into standby instead of unused you avoid the risk of having an unstable vMotion network. If a NIC fails, you might experience some vMotion performance degradation as the traffic gets routed through the same NIC, but you can trust your vMotion network to correclty migrate all virtual machines off the host in order for to replace the faulty NIC.
* Word to the wise
By writing about the fact that vMotion designates a vmknic to be the default interface I’m aware that this triggers and sparks the interest of some of the creative minds in our community. Please do not attempt to figure out which vmknic is designated as default interface and make that specific vmknic redundant and different from the rest. To paraphrase Albert Einstein: “Simplicity is the root of all genius”. Keep your Multi-NIC consistent and identical within the host and throughout all hosts. This saves you a lot of frustration during troubleshooting. Being able to depend on your vMotion network to migrate the virtual machines safely and correctly is worth its weight in gold.
Part 1 – Designing your vMotion network
Part 3 – Multi-NIC vMotion and NetIOC
Part 4 – Choose link aggregation over Multi-NIC vMotion?
Part 5 – 3 reasons why I use a distributed switch for vMotion networks

Thin or thick disks? – it’s about management not performance

December 19, 2012 by frankdenneman

This is my contribution to the debate Zero or Thick disks – debunking the performance myth.
The last couple of years all sorts of VMware engineers worked very hard to reduce the performance difference between thin disks and thick disks. Many white-papers have been written by performance engineers to explain the improvements made on thin-disk. Therefore today the question whether to use Thin-provisioned disks or Eager zero thick is not about the difference in performance but the difference in management.
When using Thin-provisioned VMDKs you need to have a very clear defined process. What to do, when your datastore, which stores the thin provisioned disks is getting full? You need to define a consolidation ratio, you need to understand which operational process might be dangerous to your environment (think Patch-Tuesday) and what space utilization threshold you need to define before migrating thin-provisioned disks to other datastores.
Today Storage DRS can help you with many of the fore mentioned challenges. For more information please read the article: Avoiding VMDK level over-commitment while using Thin-provisioned disks and Storage DRS.
If Storage DRS is not used, Thin-provisioned disks can require a seamless collaboration between virtualization teams (provisioning and architecture) and storage administrators. When this is not possible due to organizational cultural differences, thin provisioning is rather a risk, than bliss.
Zero out process: Eager zero thick on the other hand might provide in some (corner) cases a marginal performance increase; the costs involved could outweigh the perceived benefits. First of all, Eager zero thick disks need to be zeroed out during creation, when your array doesn’t support the VAAI initiatives, this can take a hit on performance and the time to provision is extended. With terabyte sized disks becoming more common this will impact provisioning time immensely.
Waste of space: Most virtualized environments use virtual machines, typically configured with oversized OS disks and over-specced data disks, resulting in wasted space full of zero’s. Thin-provisioned disks only occupy the space used for storing data, not zero’s.
Migration: Storage vMotion goes out of its way to migrate every little bit of a virtual disk, this means it needs to copy over every zeroed out block. Combined with the oversized disks, you are creating unnecessary overhead on your hosts and storage subsystem copying and verifying the integrity of zeroed out blocks. Migrating thin disks only requires migrating the “user-data”, resulting in faster migration times, lesser overhead on hosts and storage subsystem.
In essence, Thin-provisioned disks versus Eager zero thick is all about resource/time saving versus risk avoidance. Choose wisely