Resource management within virtual infrastructures relies on distributed algorithms, as a result I became more and more interested in the application of computer algorithms in other areas. Today I found an English version of the multiple award winning Dutch documentary which I can finally share with my non-dutch speaking friends. The documentary reviews the flash crash on the U.S. Stock Market on May 6th 2010. In particular it explores the application of black box trading (algo-trading) and how algorithms shaped and formed the architecture of not only of the trade market institutions but also city architectures and human terraforming.
Money & Speed: Inside the Black Box (Marije Meerman, VPRO)
Sit back and be amazed.
Once viewed, view the TED presentation by Kevin Slavin: How algorithms shape our world. Kevin Slavin zooms in some of the algorithms applied in our lives and how it affect us and our surroundings.
Adjusting the cost of vMotion – a word of caution
Yesterday I posted an article on how to change the cost of vMotion in order to change the default number of concurrent vMotion. As I mentioned in the article, I’m not a proponent of changing advanced settings.
Today Kris posted a very interesting question;
How about the scenario where one uses multi NIC vMotion for against two 5Gbps virtual adapters)? I know a cost of 4 will be set for the network by the VMkernel, however as the aggregate bandwidth becomes 10Gbps is it safe enough to raise the limit? Perhaps not to the full 8 for 10Gbps, but 6?
Please note that this article does not bash Kris. He provides a use case that I’ve heard a couple of times, making his comment an example use case. Although Kris’s scenario sounds like a very good use case to adjust the cost settings to circumvent the line-speed detection of the VMkernel to determine the max-cost of the network resource, it does not solve the other dynamic elements using line speed.
DRS MaxMovesPerHost
ESX 4.1 Introduces the MaxMovesPerHost setting, allowing the host to dynamically set the limit on moves. The limit is based on how many moves DRS thinks can be completed in one DRS evaluation interval. DRS adapts to the frequency it is invoked (pollPeriodSec, default 300 seconds) and the average migration time observed from previous migrations. However, this limit is still bound by the detected line speed and the associated Max cost. Although the proposed environment has 10GB line speed in total available, the VMkernel will still set the max cost to allow 4 vMotions on the host. Restricting the number of migrations, DRS can initiate during a load balance operation.
vMotion system resource pool CPU reservation
vMotion tries to move the used memory blocks as fast as possible. vMotion uses all the available bandwidth depending on the available CPU speed and bandwidth. Depending on the detected line speed, vMotion reserves an X amount of CPU speed at the start of a vMotion process. vMotion computes its desired host vMotion CPU reservation. For every 1GBe vMotion link speed it detects vMotion in vSphere 5.1 reserved 10% of a CPU core with a minimum desired CPU reservation of 30%. This means that if you use a single 1GBe, vMotion reserves 30% of a core, if you use 4 x 1GBe connections, that means vMotion reserves 40% of a core. A 10GBe link is special as vMotion reserves 100% of a single core.
vMotion creates a (system) resource pool and sets the appropriate CPU reservation on the resource pool. It’s important to note that this is being done to the vMotion resource pool, which means that the reservation is shared across all vMotions happening on the host.
Using two 5GB links, results in a 40% CPU core reservation (default 30% plus 10% for the extra link). However, this dynamic behavior might get unnoticed if you have enough spare CPU cycles in your source and destination host.
Word of caution
I hope these two examples show that there are multiple dynamic elements working together on various levels in your virtual infrastructure. Adjusting a setting might improve the performance of a specific use case, but to change the overall behavior, lots of settings have to be changed. Due to the lack of time and specific information correlating various settings is impossible for many of us most of the time. Therefore I would like to repeat my recommendation. Please do not adjust advanced settings only if VMware supports ask you to.
Limiting the number of concurrent vMotions
After explaining how to limit the number of concurrent Storage vMotions operations, I received multiple questions on how to limit the number of concurrent vMotion operations. This article will cover the cost and max cost constructs and show you how to calculate the correct config key values to limit the number of concurrent vMotion operations.
Please note
I usually do not post on configuration keys that change default behavior simply because I feel that most defaults are sufficient, and it should only be changed as a last resort when all other avenues are exhausted. I would like to mention that this is an unsupported configuration. Support will request to remove these settings before troubleshooting your environment!
Cost
To manage and limit the number of concurrent migrations either by vMotion or Storage vMotion, a cost and maximum cost (max cost) factor is applied. Think of the maximum cost as a limit. A resource has a max cost, and an operation is assigned a cost. A vMotion and Storage vMotion are considered operations, and the ESXi host, network, and datastore are considered resources.
In order for a migration operation to be able to start, the cost cannot exceed the max cost. A resource has both a max cost and an in-use cost. When an operation is started, the resource records an in-use cost and allows additional operations until the maximum cost is reached. The in-use cost of an active operation and the new operation cost cannot exceed the max cost.
As mentioned, there are three resources, host, network, and datastore. A vMotion operation interacts with the host, network, and datastore resource, while Storage vMotion interacts with the host and datastore resource. This means that changing the host or datastore-related cost can impact both vMotion and Storage vMotion. Let’s look at the individual costs and max cost before looking into which config key to change.
Host
Operation | Config Key | Cost |
vMotion Cost | costPerVmotionESX41 | 1 |
Storage vMotion Cost | costPerSVmotionESX41 | 4 |
Maximum Cost | maxCostPerEsx41Host | 8 |
Network
Operation | Config Key | Cost |
vMotion Cost | networkCostPerVmotion | 1 |
Storage vMotion Cost | networkCostPerSVmotion | 0 |
Maximum Cost | maxCostPerNic | 2 |
maxCostPer1GNic | 4 | |
maxCostPer10GNic | 8 |
Datastore
Operation | Config Key | Cost |
vMotion Cost | CostPerEsx41Vmotion | 1 |
Storage vMotion Cost | CostPerEsx41SVmotion | 16 |
Maximum Cost | maxCostPerEsx41Ds | 128 |
Please note that because these values were not changed after 4.1, the advanced settings were unnecessary. Therefore these advanced settings apply to ESXi 5.0 and ESXi 5.1 as well.
Default concurrent vMotion limit
To limit vMotion we must identify which costs and max costs are involved;
Datastore Cost: As we know, a vMotion transfers the memory from the source ESX host to the destination ESX host, sends over the pages stored in a non-shared page file is this exists, and finally transfers the ownership to the new VMX file. For the new host to run the new virtual machine, a new VMX file is created on the datastore; therefore, a vMotion process also includes the cost on the datastore resource. Although it generates overhead, the impact is very low. Therefore, the cost involved in the data store is 1.
Datastore Max Cost: The maximum cost of a datastore is 128, therefore, a maximum of 128 concurrent vMotion operations can be active on a single datastore.
Network: The cost for a vMotion on the network resource is 1
Network Max Cost: This config key is very interesting as it is set dynamically. The config key depends on the line speed detected by the VMkernel. If the VMkernel detects a line speed between 1 GB and 10GB, then the max cost value is set to 4. If the VMkernel detects 10GB, then the max cost value is set to 8. Please note that the VMkernel will set the max cost to 10GB ONLY if it detects 10GB line speed. It does not matter if you use 10GB Ethernet cards. It’s the line speed that counts. Please read the article “The impact of QoS network traffic on VM performance” and “Adaptive MaxMovesPerHost” if you apply a QOS on your converged network and wonder what impact this might have on vMotion performance and DRS load balancing.
If the VMkernel detects a line speed below 1GB, it sets the max cost to 2, resulting in a maximum number of concurrent vMotions of 2 with the default network vMotion cost. Please note that the supported minimum required bandwidth is 1GB! This < 1gB line speeds setting is included for “just-in-case” scenarios where the vMotion network is temporarily incorrectly configured. It should not be used to justify a < 1gb line speed when designing the virtual infrastructure! Host cost: The cost for a vMotion on the host resource is 1
Host max cost: The host max config for all vMotion operations is 8.
In-use cost
If a vMotion is configured with a 1GB line speed, the max cost of the network allows for 4 concurrent vMotion, while the host max cost allows 8 concurrent vMotions. The most conservative max cost wins as the vMotion network does not allow the in-use cost to exceed the max cost. In the datastore cost section, I explained that the datastore allows for 128 concurrent vMotions. What usually is more common is to see multiple Storage vMotion operations active on a datastore due to Storage DRS Datastore Maintenance. If you are vMotioning a virtual machine that resides on the datastore to another host and you put a datastore into datastore maintenance mode, Storage DRS cannot initiate 8 storage vMotion because 8 Storage vMotion and the in-use cost of a vMotion exceeds the max cost of 128 of the data store. “Only” 7 concurrent Storage vMotions can be initiated while the vMotion is active. The in-use cost of the datastore is 7 x 16 = 112 + 1 (vMotion) = 113. Although it has 15 “points” left, it cannot start another Storage vMotion.
Let’s assume the vMotion network is configured with 10GB line speed. This means that the host will allow for 8 concurrent vMotions. But if a Storage vMotion is already active, the in-use cost of the host is 4; therefore, the host can only allow for 1 additional Storage vMotion or 4 concurrent vMotions.
networkCostPerVmotion
As both Storage vMotion and vMotion use the host resource max cost, it is “recommended” to adjust the config key “networkCostPerVmotion”. Setting this config key to 2 allows for 2 concurrent vMotions on a 1GB vMotion network per host or 4 concurrent vMotions on a 10GB vMotion per host.The networkCostPerVmotion can be adjusted by editing the vpxd.cfg or via the advanced settings of the vCenter Server Settings in the administration view.
If done via the vpxd.cfg, the value vpxd.ResourceManager.networkCostPerVmotion is added as follows:
< config >
< vpxd >
< ResourceManager >
< networkCostPerVmotion > new value < /networkCostPerVmotion >
< /ResourceManager >
< /vpxd >
< /config >
Word of caution
Please note that cost and max values are applied to each migration process within vCenter! Therefore modification of costs impacts normal day-to-day DRS and Storage DRS load balancing operations as well as the manual vMotion and Storage vMotion operations occurring in the virtual infrastructure managed by the vCenter server. Adjusting the cost at the host side can be tricky as the costs of operation and limits are relative to each other and can even harm other host processes unrelated to migration processes.
Adding new disk to an existing virtual machine in a Storage DRS Datastore Cluster
Recently I had some discussions where I needed to clarify the behavior of Storage DRS when the user adds a new disk to a virtual machine that is already running in the datastore cluster. Especially what will happen if the datastore cluster is near its capacity?
When adding a new disk, Storage DRS reviews the configured affinity rule of the virtual machine. By default Storage DRS applies an affinity rule (Keep VMDKs together by default) to all new virtual machines. vSphere 5.1 allows you to change the default behavior of the cluster, you can change the default affinity rule in the Datastore cluster settings:
Adding a new disk to an existing virtual machine
The first one to realize is that Storage DRS never can violate the affinity or anti-affinity rule of the virtual machine. For example, if the datastore cluster default affinity rule is set to “keep VMDKs together” then all the files are placed on the same datastore. Ergo if a new disk is added to the virtual machine, that disk must be stored on the same datastore in order not to violate the affinity rule.
Let use an example, VM1 is placed in the datastore and is configured with the Intra-VM affinity rule (Keep the files inside the VM together).
The virtual machine is configured with 2 hard disks and both reside on the datastore [nfs-f-07] of the datastore cluster
When adding another disk, Storage DRS provides me with an recommendation
Although all the other datastores inside the datastore cluster are excellent candidates as well, Storage DRS is forced to place the VMDK on the same datastore together with the rest of the virtual machine files.
Datastore cluster defragmentation
At one point you may find yourself in a situation where the datastores inside the datastore cluster are quite full. Not enough free space per datastore, but enough free space in the datastore cluster to host another disk of an exisiting virtual machine. In that situation, Storage DRS does not break the affinity rule but starts to “defragment” the datastore cluster. It will move virtual machines around to provide enough free space for the new VMDK. The article “Storage-DRS initial placement and datastore cluster defragmentation” can provide you more information about this mechanism.
Key takeaway
Therefore in a datastore cluster you will never see Storage DRS splitting up a virtual machine if the VM is configured with an affinity rule, but you will see pre-requisite moves, migrating virtual machines out of the datastore to make room for the new vmdk.
Why do you manually select a datastore while using Storage DRS?
On the community forums a couple of threads are active about the Storage DRS Automation level behavior when trying to manually migrate a virtual machine between datastores in the same datastore cluster. When migrating within the datastore cluster, Storage DRS is disabled for that virtual machine. Some community members asked me why Storage DRS disables automation for this virtual machine when migrating between datastores inside a datastore cluster or when selecting a datastore during placement of a new virtual machine.
Intent
It is all about intent. When migrating a virtual machine into a datastore cluster, you are migrating the virtual machine into a load-balancing domain (the datastore cluster). You allow and trust Storage DRS to provide you an environment that provides an optimum load balanced state where the virtual machines receive the overall best I/O performance and the optimal placement regarding space utilization.
If the user wants to migrate the virtual machine to a different datastore inside the datastore cluster, Storage DRS is capturing this intent, as “user knows best”. The way this is designed is that if a datastore is selected, then user is telling us that the selected datastore is the best, i.e. user knows something Storage DRS doesn’t. And to prohibit any future migration recommendation to other datastores, Storage DRS is disabled to ensure permanent placement.
This behavior also applies when migrating a virtual machine into a datastore cluster. During initial placement it is expected that the user selects the datastore cluster, if the user wants to select a specific datastore it has to select “Disable Storage DRS for this virtual machine” in order to be able to select a member datastore.
But this brings me to the question I have; what is the reason for not trusting Storage DRS? Why do you manually select a datastore while using Storage DRS?
Apparently old habits die-hard and most tell me that their administrator feels like they could beat Storage DRS in placement.
I’ve written a couple of articles about it (plus two 100+ page chapters featured in two books) about the working of Storage DRS and trust me it’s very difficult to beat Storage DRS placement and migration recommendations. During development the engineers try to run an equivalent experiment to IBM big blue versus Gary Kasparov and lined up two world-class storage experts versus Storage DRS. Although they received answers to all their questions they could not match the overall performance improvement Storage DRS could provide.
Correlation of metrics
Storage DRS has a lot of visibility into the environment, it measures space growth rates of existing virtual machine with thin disks, snapshots, etc. It selects destination datastore based on current utilization and growth rates. It builds device models to understand the performance of the devices backing the datastore as well as measuring the overall load on the datastores. It creates workload models of the existing virtual machine and measures on multiple metrics. Due to the insights Storage DRS can decide to migrate virtual machines to other datastores in order to make room or avoid the I/O threshold. It analyzes the environment and prefers moving virtual machines with low storage vMotion overhead.
For more information please read the following articles:
Storage DRS automation level and initial placement behavior
Storage DRS Initial placement and datastore cluster defragmentation
Avoiding VMDK level over commitment while using Thin disks and Storage DRS
If you have a specific use case, for example to run some benchmark test on a datastore, then the option “Disable Storage DRS for this virtual machine” helps you to prevent Storage DRS from interrupting your test. However I would recommend selecting the datastore cluster as a destination instead of a specific datastore when migrating a virtual machine into a datastore cluster. Read the article Storage vMotion migration into a datastore cluster for more information.
Remember Storage DRS always generate a recommendation that you can review during provisioning. After selecting the destination (datastore cluster), the user interface provides an overview of the current selections, at the right part of the screen a link “more recommendations” is provided.
More recommendations
After you click on the more recommendation link, the user interface provides you with a list of alternative recommendations. The order of the list is that the top recommendation provides the best placement, this is the same recommendation listed in the previous review selection screen. The list provides an overview of the space utilization before placement (2nd column), space utilization after placement (3rd column) and I/O latency before placement on the destination datastore (4th column). As the screenshot shows, the 2nd recommendation shows that placing the EMC-003 datastore provides the best placement. This datastore has the lowest utilization before and after placement and has the lowest I/O latency of all the other datastores inside the datastore cluster.
Use this screen to educate your team responsible for provisioning and placement, show them that Storage DRS take multiple metrics into account and review the impact of the result if they picked the datastore of their choice. For my education, please share your thoughts on why you want to manually select a datastore that is a part of a datastore cluster?