frankdenneman.nl - Page 48 of 83 -

Limiting the number of concurrent vMotions

January 14, 2013 by frankdenneman

After explaining how to limit the number of concurrent Storage vMotions operations, I received multiple questions on how to limit the number of concurrent vMotion operations. This article will cover the cost and max cost constructs and show you how to calculate the correct config key values to limit the number of concurrent vMotion operations.

Please note
I usually do not post on configuration keys that change default behavior simply because I feel that most defaults are sufficient, and it should only be changed as a last resort when all other avenues are exhausted. I would like to mention that this is an unsupported configuration. Support will request to remove these settings before troubleshooting your environment!

Cost
To manage and limit the number of concurrent migrations either by vMotion or Storage vMotion, a cost and maximum cost (max cost) factor is applied. Think of the maximum cost as a limit. A resource has a max cost, and an operation is assigned a cost. A vMotion and Storage vMotion are considered operations, and the ESXi host, network, and datastore are considered resources.
In order for a migration operation to be able to start, the cost cannot exceed the max cost. A resource has both a max cost and an in-use cost. When an operation is started, the resource records an in-use cost and allows additional operations until the maximum cost is reached. The in-use cost of an active operation and the new operation cost cannot exceed the max cost.

As mentioned, there are three resources, host, network, and datastore. A vMotion operation interacts with the host, network, and datastore resource, while Storage vMotion interacts with the host and datastore resource. This means that changing the host or datastore-related cost can impact both vMotion and Storage vMotion. Let’s look at the individual costs and max cost before looking into which config key to change.
Host

Operation	Config Key	Cost
vMotion Cost	costPerVmotionESX41	1
Storage vMotion Cost	costPerSVmotionESX41	4
Maximum Cost	maxCostPerEsx41Host	8

Network

Operation	Config Key	Cost
vMotion Cost	networkCostPerVmotion	1
Storage vMotion Cost	networkCostPerSVmotion	0
Maximum Cost	maxCostPerNic	2
	maxCostPer1GNic	4
	maxCostPer10GNic	8

Datastore

Operation	Config Key	Cost
vMotion Cost	CostPerEsx41Vmotion	1
Storage vMotion Cost	CostPerEsx41SVmotion	16
Maximum Cost	maxCostPerEsx41Ds	128

Please note that because these values were not changed after 4.1, the advanced settings were unnecessary. Therefore these advanced settings apply to ESXi 5.0 and ESXi 5.1 as well.

Default concurrent vMotion limit
To limit vMotion we must identify which costs and max costs are involved;

Datastore Cost: As we know, a vMotion transfers the memory from the source ESX host to the destination ESX host, sends over the pages stored in a non-shared page file is this exists, and finally transfers the ownership to the new VMX file. For the new host to run the new virtual machine, a new VMX file is created on the datastore; therefore, a vMotion process also includes the cost on the datastore resource. Although it generates overhead, the impact is very low. Therefore, the cost involved in the data store is 1.

Datastore Max Cost: The maximum cost of a datastore is 128, therefore, a maximum of 128 concurrent vMotion operations can be active on a single datastore.

Network: The cost for a vMotion on the network resource is 1

Network Max Cost: This config key is very interesting as it is set dynamically. The config key depends on the line speed detected by the VMkernel. If the VMkernel detects a line speed between 1 GB and 10GB, then the max cost value is set to 4. If the VMkernel detects 10GB, then the max cost value is set to 8. Please note that the VMkernel will set the max cost to 10GB ONLY if it detects 10GB line speed. It does not matter if you use 10GB Ethernet cards. It’s the line speed that counts. Please read the article “The impact of QoS network traffic on VM performance” and “Adaptive MaxMovesPerHost” if you apply a QOS on your converged network and wonder what impact this might have on vMotion performance and DRS load balancing.

If the VMkernel detects a line speed below 1GB, it sets the max cost to 2, resulting in a maximum number of concurrent vMotions of 2 with the default network vMotion cost. Please note that the supported minimum required bandwidth is 1GB! This < 1gB line speeds setting is included for “just-in-case” scenarios where the vMotion network is temporarily incorrectly configured. It should not be used to justify a < 1gb line speed when designing the virtual infrastructure! Host cost: The cost for a vMotion on the host resource is 1

Host max cost: The host max config for all vMotion operations is 8.

In-use cost
If a vMotion is configured with a 1GB line speed, the max cost of the network allows for 4 concurrent vMotion, while the host max cost allows 8 concurrent vMotions. The most conservative max cost wins as the vMotion network does not allow the in-use cost to exceed the max cost. In the datastore cost section, I explained that the datastore allows for 128 concurrent vMotions. What usually is more common is to see multiple Storage vMotion operations active on a datastore due to Storage DRS Datastore Maintenance. If you are vMotioning a virtual machine that resides on the datastore to another host and you put a datastore into datastore maintenance mode, Storage DRS cannot initiate 8 storage vMotion because 8 Storage vMotion and the in-use cost of a vMotion exceeds the max cost of 128 of the data store. “Only” 7 concurrent Storage vMotions can be initiated while the vMotion is active. The in-use cost of the datastore is 7 x 16 = 112 + 1 (vMotion) = 113. Although it has 15 “points” left, it cannot start another Storage vMotion.

Let’s assume the vMotion network is configured with 10GB line speed. This means that the host will allow for 8 concurrent vMotions. But if a Storage vMotion is already active, the in-use cost of the host is 4; therefore, the host can only allow for 1 additional Storage vMotion or 4 concurrent vMotions.

networkCostPerVmotion
As both Storage vMotion and vMotion use the host resource max cost, it is “recommended” to adjust the config key “networkCostPerVmotion”. Setting this config key to 2 allows for 2 concurrent vMotions on a 1GB vMotion network per host or 4 concurrent vMotions on a 10GB vMotion per host.The networkCostPerVmotion can be adjusted by editing the vpxd.cfg or via the advanced settings of the vCenter Server Settings in the administration view.

If done via the vpxd.cfg, the value vpxd.ResourceManager.networkCostPerVmotion is added as follows:

< config >
< vpxd >
< ResourceManager >
< networkCostPerVmotion > new value < /networkCostPerVmotion >
< /ResourceManager >
< /vpxd >
< /config >

Word of caution
Please note that cost and max values are applied to each migration process within vCenter! Therefore modification of costs impacts normal day-to-day DRS and Storage DRS load balancing operations as well as the manual vMotion and Storage vMotion operations occurring in the virtual infrastructure managed by the vCenter server. Adjusting the cost at the host side can be tricky as the costs of operation and limits are relative to each other and can even harm other host processes unrelated to migration processes.

Adding new disk to an existing virtual machine in a Storage DRS Datastore Cluster

January 11, 2013 by frankdenneman

Recently I had some discussions where I needed to clarify the behavior of Storage DRS when the user adds a new disk to a virtual machine that is already running in the datastore cluster. Especially what will happen if the datastore cluster is near its capacity?
When adding a new disk, Storage DRS reviews the configured affinity rule of the virtual machine. By default Storage DRS applies an affinity rule (Keep VMDKs together by default) to all new virtual machines. vSphere 5.1 allows you to change the default behavior of the cluster, you can change the default affinity rule in the Datastore cluster settings:

Adding a new disk to an existing virtual machine
The first one to realize is that Storage DRS never can violate the affinity or anti-affinity rule of the virtual machine. For example, if the datastore cluster default affinity rule is set to “keep VMDKs together” then all the files are placed on the same datastore. Ergo if a new disk is added to the virtual machine, that disk must be stored on the same datastore in order not to violate the affinity rule.
Let use an example, VM1 is placed in the datastore and is configured with the Intra-VM affinity rule (Keep the files inside the VM together).

The virtual machine is configured with 2 hard disks and both reside on the datastore [nfs-f-07] of the datastore cluster

When adding another disk, Storage DRS provides me with an recommendation

Although all the other datastores inside the datastore cluster are excellent candidates as well, Storage DRS is forced to place the VMDK on the same datastore together with the rest of the virtual machine files.

Datastore cluster defragmentation
At one point you may find yourself in a situation where the datastores inside the datastore cluster are quite full. Not enough free space per datastore, but enough free space in the datastore cluster to host another disk of an exisiting virtual machine. In that situation, Storage DRS does not break the affinity rule but starts to “defragment” the datastore cluster. It will move virtual machines around to provide enough free space for the new VMDK. The article “Storage-DRS initial placement and datastore cluster defragmentation” can provide you more information about this mechanism.
Key takeaway
Therefore in a datastore cluster you will never see Storage DRS splitting up a virtual machine if the VM is configured with an affinity rule, but you will see pre-requisite moves, migrating virtual machines out of the datastore to make room for the new vmdk.

Why do you manually select a datastore while using Storage DRS?

January 10, 2013 by frankdenneman

On the community forums a couple of threads are active about the Storage DRS Automation level behavior when trying to manually migrate a virtual machine between datastores in the same datastore cluster. When migrating within the datastore cluster, Storage DRS is disabled for that virtual machine. Some community members asked me why Storage DRS disables automation for this virtual machine when migrating between datastores inside a datastore cluster or when selecting a datastore during placement of a new virtual machine.
Intent
It is all about intent. When migrating a virtual machine into a datastore cluster, you are migrating the virtual machine into a load-balancing domain (the datastore cluster). You allow and trust Storage DRS to provide you an environment that provides an optimum load balanced state where the virtual machines receive the overall best I/O performance and the optimal placement regarding space utilization.
If the user wants to migrate the virtual machine to a different datastore inside the datastore cluster, Storage DRS is capturing this intent, as “user knows best”. The way this is designed is that if a datastore is selected, then user is telling us that the selected datastore is the best, i.e. user knows something Storage DRS doesn’t. And to prohibit any future migration recommendation to other datastores, Storage DRS is disabled to ensure permanent placement.
This behavior also applies when migrating a virtual machine into a datastore cluster. During initial placement it is expected that the user selects the datastore cluster, if the user wants to select a specific datastore it has to select “Disable Storage DRS for this virtual machine” in order to be able to select a member datastore.
But this brings me to the question I have; what is the reason for not trusting Storage DRS? Why do you manually select a datastore while using Storage DRS?
Apparently old habits die-hard and most tell me that their administrator feels like they could beat Storage DRS in placement.
I’ve written a couple of articles about it (plus two 100+ page chapters featured in two books) about the working of Storage DRS and trust me it’s very difficult to beat Storage DRS placement and migration recommendations. During development the engineers try to run an equivalent experiment to IBM big blue versus Gary Kasparov and lined up two world-class storage experts versus Storage DRS. Although they received answers to all their questions they could not match the overall performance improvement Storage DRS could provide.
Correlation of metrics
Storage DRS has a lot of visibility into the environment, it measures space growth rates of existing virtual machine with thin disks, snapshots, etc. It selects destination datastore based on current utilization and growth rates. It builds device models to understand the performance of the devices backing the datastore as well as measuring the overall load on the datastores. It creates workload models of the existing virtual machine and measures on multiple metrics. Due to the insights Storage DRS can decide to migrate virtual machines to other datastores in order to make room or avoid the I/O threshold. It analyzes the environment and prefers moving virtual machines with low storage vMotion overhead.
For more information please read the following articles:
Storage DRS automation level and initial placement behavior
Storage DRS Initial placement and datastore cluster defragmentation
Avoiding VMDK level over commitment while using Thin disks and Storage DRS
If you have a specific use case, for example to run some benchmark test on a datastore, then the option “Disable Storage DRS for this virtual machine” helps you to prevent Storage DRS from interrupting your test. However I would recommend selecting the datastore cluster as a destination instead of a specific datastore when migrating a virtual machine into a datastore cluster. Read the article Storage vMotion migration into a datastore cluster for more information.
Remember Storage DRS always generate a recommendation that you can review during provisioning. After selecting the destination (datastore cluster), the user interface provides an overview of the current selections, at the right part of the screen a link “more recommendations” is provided.

More recommendations
After you click on the more recommendation link, the user interface provides you with a list of alternative recommendations. The order of the list is that the top recommendation provides the best placement, this is the same recommendation listed in the previous review selection screen. The list provides an overview of the space utilization before placement (2nd column), space utilization after placement (3rd column) and I/O latency before placement on the destination datastore (4th column). As the screenshot shows, the 2nd recommendation shows that placing the EMC-003 datastore provides the best placement. This datastore has the lowest utilization before and after placement and has the lowest I/O latency of all the other datastores inside the datastore cluster.

Use this screen to educate your team responsible for provisioning and placement, show them that Storage DRS take multiple metrics into account and review the impact of the result if they picked the datastore of their choice. For my education, please share your thoughts on why you want to manually select a datastore that is a part of a datastore cluster?

vSphere 5.1 web client: VM overrides -Storage DRS automation level overview

January 9, 2013 by frankdenneman

Overall the vSphere 5.1 web client attempts to mimic the behavior of menus and settings workflows of the (old) vSphere client. When editing the settings of a datastore cluster, the web client provides the same set of options that can be edited as the vSphere client. However certain functions of overviews and menus are changed in the vSphere 5.1 web client. For example the VM overrides screen. The primary purpose of the VM overrides screen is to display deviant Storage DRS Automation level of the virtual machines inside the datastore cluster.
VM overrides and Virtual Machine Settings screens
The VM overrides screen is located in the storage view, select the datastore cluster, select the tab Manage and click on the Settings button.

The VM overrides screen is the replacement of the virtual machine settings screen of the datastore cluster settings in the vSphere client.

Difference in default behavior
As you might have noticed, the web client is not listing any virtual machine while the Virtual Machine settings overview in the vSphere client shows 5 virtual machines and a VM template. Already mentioned in the introduction paragraph, the primary purpose of the VM overrides screen has changed from the Virtual Machine settings overview in the vSphere client. The VM Overrides screen only displays a virtual machine is set to a non-default automation level.
To display the different behavior, I have change the Automation level of VM3, VM4, VM5. The datastore cluster is configured with a Manual Automation Mode. Therefor the default automation mode is Default (Manual). The previous screenshot shows that all virtual machines are configured with the Default (Manual) automation level, VM3 is changed to Fully Automated, VM4 to Manual and VM5 to disabled. If you want to reproduce this behavior in your own environment, change the automation level in the vSphere client and then go the VM overrides screen in the web client to see the modified virtual machines listed.

The VM overrides screen displays the following:

Even though VM4 is configured with the same automation level as the datastore cluster, the VM overrides screen displays VM4 as it is not configured with the default automation mode. By changing the automation mode back to Default (Manual) via the Edit screen, VM4 is removed from the VM overrides list.

To be honest it took me a while to get used to the new functionality of this screen. I would like to know if you like this new behavior or if you rather prefer the way the virtual machine settings view in the old vSphere client works?

Manual storage vMotion migrations into a datastore cluster

January 8, 2013 by frankdenneman

Frequently I receive questions about the impact of a manual migration into a datastore cluster, especially about the impact of the VM disk file layout. Will Storage DRS take the initial disk layout into account or will it be changed? The short answer is that the virtual machine disk layout will be changed by the default affinity rule configured on the datastore cluster. The article describes several scenarios of migrating “distributed“ and “centralized” disk layout configurations into datastore cluster configured with different affinity rules.
Test scenario architecture
For the test scenarios I’ve build two virtual machines VM1 and VM2 Both virtual machines are of identical VM configuration, only the datastore location is different. VM1-centralized has a “centralized” configuration, storing all VMDKs on a single datastore, while VM2-distributed has a “distributed” configuration, storing all VMDKs on separate datastores.

Hard disk	Size	VM 1 datastore	VM 2 datastore
Working directory	8GB	FD-X4	FD-X4
Hard disk 1	60GB	FD-X4	FD-X4
Hard disk 2	30GB	FD-X4	FD-X5
Hard disk 1	10GB	FD-X4	FD-X6

Two datastore clusters exists in the virtual infrastructure:

Datastore cluster	Default Affinity rule	VMDK rule applied on VM
Tier-1 VMs and VMDKs	Do not keep VMDKs together	Intra-VM Anti-affinity
Tier-2 VMs and VMDKs	Keep VMDKs together	Intra-VM Affinity rule

Test 1: VM1-centralized to Datastore Cluster Tier-2 VMs and VMDKs
Since the virtual machine is stored on a single datastore is makes sense to start of migrating the virtual machine to the datastore cluster which applies a VMDK affinity rule, keeping the virtual machine disk files together on a single datastore in the datastore cluster.Select the virtual machine, right click the virtual machine to display the submenu and select the option “Migrate…”. The first step is to select the migration type, select change datastore.
00-UI-select-migration type
The second step is to select the destination datastore, as we are planning to migrate the virtual machine to a datastore cluster it is necessary to select the datastore cluster object.
01-UI-select-Datastore
After clicking next, the user interface displays the Review Selection screen; notice that the datastore cluster applied the default cluster affinity rule.
02-UI-review-selections
Storage DRS has evaluated the current load of the datastore cluster and the configuration of the virtual machine, it concludes that datastore nfs-f-05 is the best fit for the virtual machine, the existing virtual machines in the datastore cluster and the load balance state of the cluster. By clicking “more recommendations” other datastore destinations are presented.
Test result: Intra-VM affinity rule applied and all virtual machine disk files are stored on a single datastore
Selecting the Datastore cluster object
The user interface provides you two options, select the datastore cluster object or a datastore that is part of the datastore cluster, however for that option you explicitly need to disable Storage DRS for this virtual machine. By selecting the datastore cluster, you fully leverage the strength of Storage DRS. Storage DRS initiates it’s algorithms and evaluate the current state of the datastore cluster. It reviews the configuration of the new virtual machine and is aware of the I/O load of each datastore as well as the space utilization. Storage DRS weigh both metrics and will weigh either space of I/O load heavier if the utilization is higher.
Disable Storage DRS for this virtual machine
By default it’s not possible to select a specific datastore that is a part of a datastore cluster during the second step “Select Datastore”. In order to do that, one must activate (tick the option box) the “Disable Storage for this virtual machine”. By doing so the datastores in the lower part of the screen are available for selection. However this means that the virtual machine will be disabled for any Storage DRS load balancing operation. Not only will it affect have an effect for the virtual machine itself, it also impacts other Storage DRS operations such as Maintenance Mode and Datastore Cluster defragmentation. As Storage DRS is not allowed to move the virtual machine, it cannot migrate the virtual machine to find an optimum load balance state when Storage DRS needs to make room for an incoming virtual machine. For more information about cluster defragmentation, read the following article: Storage DRS initial placement and datastore cluster defragmentation.
Test 2: VM1-centralized to Datastore Cluster Tier-1 VMs and VMDKs
Migrating a virtual machine stored on a single datastore to a datastore cluster with anti-affinity rules enabled results in a distribution of the virtual machine disk files:
04-VM1-DSC-Tier-1
Test result: Intra-VM anti-affinity rule applied and the virtual machine disk files are placed on separate datastores.
Working directory and default anti-affinity rules
Please note that in the previous scenario the configuration file (working directory) is placed on the same datastore as Hard disk 3. Storage DRS does not forcefully attempt to place the working directory on a different datastore. It weighs the load balance state of the cluster heavier than separation from the virtual machine VMDK files.
Test 3: VM2-distributed to Datastore Cluster Tier-1 VMs and VMDKs
Following the example of VM1, I started off by migrating VM2-Distributed to Tier-1 as the datastore cluster is configured to mimic the initial state of the virtual machine and that is to distributed the virtual machine across as many datastores as possible. After selecting Datastore Cluster Tier-1 VM and VMDKs, Storage DRS provided the following recommendation:
05-vm2-dsc-tier-1
Test result: Intra-VM anti-affinity rule applied on VM and the virtual machine disk files are stored on separate datastores.
A nice tidbit, as every virtual disk file is migrated between two distinct datastores, this scenario leverages the new functionality of parallel disk migration introduced in vSphere 5.1.
Test 4: VM2-distributed to Datastore Cluster Tier-2 VMs and VMDKs
What happens if you migrate a distributed virtual machine to a datastore cluster configured with a default affinity rule? Selecting Datastore Cluster Tier-2 VM and VMDKs, Storage DRS provided the following recommendation:
06-distributed-affinity rule
Test result: Intra-VM affinity rule applied on VM and the virtual machines are placed on a single datastore cluster.
Test 5: VM2-distributed to Multiple Datastore clusters
A common use case is to distribute a virtual machine across multiple tiers of storage to provide performance while taken economics into account. This test simulates the exercise of placing the working directory and guest OS disk (Hard disk 1) on datastore cluster Tier 2 and the database and logging hard disk (Hard disk 2 and Hard disk 3) on datastore cluster Tier 1.
In order to configure the virtual machine to use multiple datastores, click on the button Advanced during the second step of the migration:
07-advanced
This screen shows the current configuration, by selecting the current datastore of a hard disk a browse menu appears:
09-distributed-configuration
Select the appropriate datastore cluster for each hard disk and click on next to receive the destination datastore recommendation from Storage DRS.
The working directory of the VM and Hard disk 1 are stored on datastore cluster Tier 2 and Hard disk 2 and Hard disk 3 are stored in datastore cluster Tier 1.
10-multipe-datastore-clusters
As datastore cluster Tier 2 is configured to keep the virtual machine files together, both the working directory (designated as Configuration file in the UI) and Hard disk 1 are placed on datastore nfs-f-05. A default anti-affinity rule is applied to all new virtual machines in datastore cluster 2, therefore Storage DRS recommends to place Hard disk 2 on nfs-f-07 and Hard disk 3 on datastore nfs-f-01.
Test result: Intra-VM anti-affinity rule applied on VM. The files stored in Tier-2 are placed on a single datastore, while the virtual machine disk files stored in the Tier-1 datastore are located on different datastores.

Initial VM configuration	Cluster default affinity rule	Result	Configured on:
Centralized	Affinity rule	Centralized	Entire VM
Centralized	Anti0Affinity rule	Distributed	Entire VM
Distributed	Anti-Affinity rule	Distributed	Entire VM
Distributed	Affinity rule	Centralized	Entire VM
Distributed	Affinity rule	Centralized	Working directory + Hard disk 1
	Anti-Affinity rule	Distributed	Hard disk 2 and Hard disk 3

All types of migrations with the UI lead to a successful integration with the datastore cluster. Every migration results in an application of the correct affinity or anti-affinity rule set by the default affinity rule of the cluster.