Over the last couple of months I’ve seen recommendations popping up on changing the MinGoodNess and CostBenefit settings to zero on a DRS cluster (KB1017291) . Usually after the maintenance window, when hosts where placed in maintenance mode, the hosts remain unevenly loaded and DRS won’t migrate virtual machines to the less loaded host.
By disabling these adaptive algorithms, DRS to consider every move and the virtual machines will be distributed aggressively across the hosts. Although this sounds very appealing, MinGoodness and CostBenefit calculations are created for a reason. Let’s explore the DRS algorithm and see why this setting should only be used temporarily and not as a permanent setting.
DRS load balance objectives
DRS primary objective is to provide virtual machines their required resources. If the virtual machine is getting the resources it request (dynamic entitlement), than there is no need to find a better spot. If the virtual machines do not get their resources specified in their dynamic entitlement, then DRS will consider moving the virtual machine depending on additional factors.
This means that DRS allow certain situations where the administrator feels like the cluster is unbalanced, such as an uneven virtual machine count on hosts inside the cluster. I’ve seen situations where one host was running 80% of the load while the other hosts where running a couple of virtual machines. This particular cluster was comprised of big hosts, each containing 1TB memory while the entire virtual machine memory footprint was no more than 800GB. One host could easily run all virtual machines and provide the resources the virtual machines were requesting.
This particular scenario describes the biggest misunderstanding of DRS, DRS is not primarily designed to equally distribute virtual machines across hosts in the cluster. It distributes the load as efficient as possible across the resources to provide the best performance of the virtual machines. And this is the key to understand why DRS does or does not generate migration recommendation. Efficiency! To move virtual machines around, it cost CPU cycles, memory resources and to a smaller extent datastore operation (stun/unstun) virtual machines. In the most extreme case possible, load balancing itself can be a danger to the performance of virtual machines by withholding resources from the virtual machines, by using it to move virtual machines. This is worst-case scenario, but the main point is that the load balancing process cost resources that could also be used by virtual machines providing their services, which is the primary reason the virtual infrastructure is created for. To manage and contain the resource consumption of load balancing operations, MinGoodness and CostBenefit calculations were created.
CostBenefit
DRS calculates the Cost Benefit (and risk) of a move. Cost: How many resources does it take to move a virtual machine by vMotion? A virtual machine that is constantly updating its large memory footprint cost more CPU cycles and network traffic than a virtual machine with a medium memory footprint that is idling for a while. Benefit: how many resources will it free up on the source host and what will the impact be on the normalized entitlement on the destination host? The normalized entitlement is the sum of dynamic entitlement of all the virtual machines running on that host divided by the capacity of the host. Risk is predicted how the workload might change on both the source and destination host and if the outcome of the move of the candidate virtual machine is still positive when the workload changes.
MinGoodness
To understand which host the virtual machine must move to, DRS uses the normalized entitlement of the host as the key metric and will only consider hosts that have a lower normalized entitlement than the source host. MinGoodness helps DRS understand what effect the move has on the overall cluster imbalance.
DRS awards every move a CostBenefit and MinGoodness rating and these are linked together. DRS will only recommend a move with a negative CostBenefit rating if the move has a highly positive MinGoodness rating. Due to the metrics used, CostBenefit ratings are usually more conservative than the MinGoodness ratings. Overpowering the decision to move virtual machine to host with a lower normalized entitlement due to the cost involved or risk of that particular move.
When MinGoodness and CostBenefit are set to zero, DRS calculates the cluster imbalance and recommend any move* that increases the balance of the normalized entitlement of each host within the cluster without considering the resource cost involved. In oversized environments, where resource supply is abundant, setting these options temporarily should not create a problem. In environments where resource demand rivals resource supply, setting these options can create resource starvation.
*The number of recommendations are limited to the MaxMovesPerHost calculation. This article contains more information about MaxMovesPerHost.
Recommendation
My recommendation is to use this advanced option sparingly, when host-load is extremely unbalanced and DRS does not provide any migration recommendation. Typically when the hosts in the cluster were placed in maintenance mode. Permanently activating this advanced option is similar to lobotomizing the DRS load balancing algorithm, this can do more harm in the long run as you might see virtual machines in an almost-constant state of vMotion.
Limiting the number of Storage vMotions
When enabling datastore maintenance mode, Storage DRS will move virtual machines out of the datastore as fast as it can. The number of virtual machines that can be migrated in or out of a datastore is 8. This is related to the concurrent migration limits of hosts, network and datastores. To manage and limit the number of concurrent migrations, either by vMotion or Storage vMotion, a cost and limit factor is applied. Although the term limit is used, a better description of limit is maximum cost.
In order for a migration operation to be able to start, the cost cannot exceed the max cost (limit). A vMotion and Storage vMotion are considered operations. The ESXi host, network and datastore are considered resources. A resource has both a max cost and an in-use cost. When an operation is started, the in-use cost and the new operation cost cannot exceed the max cost.

The operation cost of a storage vMotion on a host is “4”, the max cost of a host is “8”. If one Storage vMotion operation is running, the in-use cost of the host resource is “4”, allowing one more Storage vMotion process to start without exceeding the host limit.
As a storage vMotion operation also hits the storage resource cost, the max cost and
in-use cost of the datastore needs to be factored in as well. The operation cost of a Storage vMotion for datastores is set to 16, the max cost of a datastore is 128. This means that 8 concurrent Storage vMotion operations can be executed on a datastore. These operations can be started on multiple hosts, not more than 2 storage vMotion from the same host due to the max cost of a Storage vMotion operation on the host level.

How to throttle the number of Storage vMotion operations?
To throttle the number of storage vMotion operations to reduce the IO hit on a datastore during maintenance mode, it preferable to reduce the max cost for provisioning operations to the datastore. Adjusting host costs is strongly discouraged. Host costs are defined as they are due to host resource limitation issues, adjusting host costs can impact other host functionality, unrelated to vMotion or Storage vMotion processes.
Adjusting the max cost per datastore can be done by editing the vpxd.cfg or via the advanced settings of the vCenter Server Settings in the administration view.
If done via the vpxd.cfg, the value vpxd.ResourceManager.MaxCostPerEsx41Ds is added as follows:
< config >
< vpxd >
< ResourceManager >
< MaxCostPerEsx41Ds > new value < /MaxCostPerEsx41Ds >
< /ResourceManager >
< /vpxd >
< /config >
As the max cost have not been increased since ESX 4.1, the value-name
Please remember to leave some room for vMotion when resizing the max cost of a datastore. The vMotion process has a datastore cost as well. During the stun/unstun of a virtual machine the vMotion process hits the datastore, the cost involved in this process is 1.
For example, Changing the
Please note that cost and max values are applied to each migration process, impact normal day to day DRS and Storage DRS load balancing operations as well as the manual vMotion and Storage vMotion operations occuring in the virtual infrastructure managed by the vCenter server.
As mentioned before adjusting the cost at the host side can be tricky as the costs of operation and limits are relative to each other and can even harm other host processes unrelated to migration processes. If you still have the urge to change the cost on the host, consider the impact on DRS! When increasing the cost of a Storage vMotion operation on the host, the available “slots” for vMotion operations are reduced. This might impact DRS load balancing efficiency when a storage vMotion process is active and should be avoided at all times.
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman
Fab-four: VMWorld 2012 sessions approved
This morning I found out that my four sessions are accepted. I’m really pleased and I am looking forward to presenting at each one of them. Two sessions, Architecting Storage DRS Datastore Clusters and vSphere Cluster Resource Pool Best Practices are also scheduled for VMWorld Barcelona.
Session ID: STO1545
Session Title: Architecting Storage DRS Datastore Clusters
Track: Infrastructure
Presenting at: US and Barcelona
Presenting with: Valentin Hamburger
Session ID: VSP1504
Session Title: Ask the Expert vBloggers
Track: Infrastructure
Presenting at: US
Presenting with: Duncan Epping, Scott Lowe, Rick Scherer and Chad Sakac
Session ID: VSP1683
Session Title: vSphere Cluster Resource Pools Best Practices
Track: Infrastructure
Presenting at: US and Barcelona
Presenting with Rawlinson Rivera
Session ID: CSM1167
Session Title: Architecting for vCloud Allocation Models
Track: Operations
Presenting at: US
Presenting with Chris Colotti
Can’t wait to attend VMworld 2012! See you there.
VMware vSphere Storage DRS Interoperability technical paper available
Today my second white paper, VMware vSphere Storage DRS Interoperability, is made available for download at the Technical Resource Center at VMware.com.
This white paper presents an overview of best practices for customers considering the implementation of VMware vSphere Storage DRS in combination with advanced storage device features or other VMware products. This document zooms in on Storage DRS interoperability with array based features, such as Auto-Tiering, Thin provisioning, Depulication but also explains VMware products such as Snapshots. A small preview:
VMware vSphere Snapshots
Storage DRS supports virtual machine snapshots. By default, it collocates them with the virtual machine disk file to prevent fragmentation of the virtual machine. Also by default, Storage DRS applies a VMDK affinity rule to each new virtual machine. If it migrates the virtual machine to another datastore, all the files, including the snapshot files, move with it. If the virtual machine is configured with an inter-VMDK affinity setting, the snapshot is placed in the directory of its related disk and is moved to the same destination datastore as when migrated by a Storage vMotion operation.
VMware supports the use of vSphere snapshots in combination with Storage DRS.

Go and download it here: http://www.vmware.com/resources/techresources/10286
VMworld Session proposals
Here is just a quick overview of the sessions I submitted for VMworld events in San Francisco and Barcelona. I’ve submitted three sessions in total, as my passion for resource management and Storage DRS is a public secret it should be no suprise that
all sessions I participate in focus on either vSphere resource managagement or Storage DRS. 🙂 I’ve split them up into two categories, vSphere centric and vCloud centric. The fourth session is the annual Ask the Expert vBloggers with the all-star crew Scott Lowe, Duncan Epping, Rick Scherer, and Chad Sakac. I hope to see you at VMworld!
vSphere centric sessions
Session 1545
Architecting Storage DRS Datastore Clusters
Abstract: In this session Frank Denneman and Valentin Hamburger will cover and explain in great detail what to consider when building a Storage DRS datastore cluster. Introducing the concept of datastore clusters can affect or shift the paradigm of storage management in virtual infrastructures. The goal is to demonstrate the relationship between the datastore cluster and existing objects in the virtual infrastructure and how the introduction of datastore clusters can effect various design decisions. This session is a must for anyone implementing Storage DRS that wants to maximize their cluster and vSphere resource designs.
Session 1683
vSphere Cluster Resource Pools Best Practices
Abstract: In this session Frank Denneman and Rawlinson Rivera will cover and explain in great detail what to consider when using resource pool inside a vSphere cluster. Introducing the concept of resource pools can affect virtual machine performance and overall resource management in virtual infrastructures.
Join Frank and Rawlinson and discover both common pitfalls and best practices of resource pool design. This session is a must for anyone implementing resource pools that wants to maximize their cluster and vSphere resource designs.
vCloud Director centric sessions
Session 1167
vCloud tracks Architecting for vCloud Allocation Models
Abstract: In this session Frank Denneman and Chris Colotti will break down the three vCloud Director Allocation models in depth. Each model’s settings will be shown in detail to explain the effect on vSphere resource scheduling. They will then show how Allocation models of the same type with different configurations, as well as different allocation models could live on the same Provider vDC. The goal is to demonstrate that by not only fully understanding the allocation models, but the vSphere resource allocation together you can design for multiple allocation models on a single Provider vDC. This session is a must for anyone implementing vCloud Director that wants to maximize their cluster and vCloud resource designs.
Session 504
Ask the Expert vBloggers – Scott Lowe, Duncan Epping, Rick Scherer, Frank Denneman, Chad Sakac
Abstract: One of the highest rated sessions at VMworld is back for it’s fifth year! Come meet four VMware Certified Design Experts (VCDX) on stage answering your questions. We get the top Virtualization Bloggers in the industry and get them on stage answering your questions in a wide array of topics.
Simon at Techhead.co.uk wrote a nice article about how to vote for your favorite session at the VMworld.com portal