Frank Denneman - Chief Technologist AI at VMware

Storage DRS enables SIOC on datastores only if I/O load balancing is enabled

August 1, 2012 by frankdenneman

Lately, I’ve received some comments why I don’t include SIOC in my articles when talking about space load balancing. Well, Storage DRS only enables SIOC on each datastore inside the datastore cluster if I/O load balancing is enabled. When you don’t enable I/O load balancing during the initial setup of the datastore cluster, SIOC is left disabled.
Keep in mind when I/O load balancing is enabled on the datastore cluster and you disable the I/O load balancing feature, SIOC remains enabled on all datastores within the cluster.

Considerations when modifying the individual VM automation level

July 27, 2012 by frankdenneman

Recently I received some questions about the behavior of DRS when the automation level of an individual virtual machine is modified. DRS allows customization of the automation levels for individual virtual machines to override the DRS cluster automation level. The most common reason for modifying the automation level is to prevent DRS move a virtual machine automatically. Selecting an automation level mode other than the default cluster automation level or fully automated impacts (daily) operational procedures. It might impact cluster balance and/or resource availability if the operational procedures are not adjusted to align with the “new” behavior of DRS when dealing with non-default automation levels. Before continuing with the impact and caveats of a non-default automation level, let’s zoom into their behavior.
Level of automation
There are five automation level modes:
• Fully Automated
• Partially Automated
• Manual
• Default
• Disabled
Each automation level behaves differently:

Automation level	Initial placement	Load Balancing
Fully Automated	Automatic Placement	Automatic execution of migration recommendation
Partially Automated	Automatic Placement	Migration recommendation is displayed
Manual	Recommended host is displayed	Migration recommendation is displayed
Disabled	VM powered-on on registered host	No migration recommendation generated

The default automation level is not listed in the table above as it aligns with the cluster automation level. When the automation level of the cluster is modified, the individual automation level is modified as well.
Disabled automation level
If the automation level of a virtual machine is set to disabled, then DRS is disabled entirely for the virtual machine. DRS will not generate a migration recommendation or generate an initial placement recommendation. The virtual machine will be powered-on on its registered host. A powered-on virtual machine with its automation level set to disabled will still impact the DRS load balancing calculation as its consumes cluster resources. During the recommendation calculation, DRS ignores the virtual machines set to disabled automation level and selects other virtual machines on that host. If DRS must choose between virtual machines set to the automatic automation levels and the manual automation level, DRS chooses the virtual machines set to automatic as it prefers them over virtual machines set to manual.
Manual automation level
When a virtual machine is configured with the manual automation level, DRS generate both initial placement and load balancing migration recommendations, however the user needs to manual approve these recommendations.
Partially automation level
DRS automatically places a virtual machine with a partially automation level, however it will generate a migration recommendation which requires manual approval.
The impact of manual and partially automation level on cluster load balance
When selecting any other automation level than disabled, DRS assumes that the user will manual apply the migration recommendation it recommends. This means that DRS will continue to include the virtual machines in the analysis of cluster balance and resource utilization. During the analysis DRS simulates virtual machine moves inside the cluster, every virtual machine that is not disabled will be included in the selection process of migration recommendations. If a particular move of a virtual machine offers the highest benefit and the least amount of cost and lowest risk, DRS generates a migration recommendation for this move. Because DRS is limited to a specific number of migrations, it might drop a recommendation of a virtual machine that provide almost similar goodness. Now the problem with this scenario is, that the recommended migration might be a virtual machine configured with a manual automation level, while the virtual machine with near-level goodness is configured with the default automation level. This should not matter if the user monitors each and every DRS invocation and reviews the migration recommendations when issued. This is unrealistic to expect as DRS runs each 5 minutes.
I’ve seen a scenario where a group of the virtual machines where configured with manual mode. It resulted in a host becoming a “trap” for the virtual machines during an overcommitted state. The user did not monitor the DRS tab in vCenter and was missing the migration recommendations. This resulted in resource starvation for the virtual machines itself but even worse, it impacted multiple virtual machines inside the cluster. Because DRS generated migration recommendations, it dropped other suitable moves and could not achieve an optimal balance.
For more information about the maximum number of moves, please read this article. Interested in more information about goodness values, please read this article.
Disabled versus partially and manual automatic levels
Disabling DRS on a virtual machines have some negative impact on other operation processes or resource availability, such as placing a host into maintenance mode or powering up a virtual machine after maintenance itself. As it selects the registered host, it might be possible that the virtual machine is powered on a host with ample available resources while more suitable hosts are available. However disabled automation level avoids the scenario described in the previous paragraph.
Partially automatic level automatically places the virtual machine on the most suitable host, while manual mode recommends placing the host on the most suitable host available. Partially automated offers the least operational overhead during placement, but can together with manual automation level introduce lots of overhead during normal operations.
Risk versus reward
Selecting an automation level is almost a risk versus reward game. Setting the automation level to disabled might impact some operation procedures, but allows DRS to neglect the virtual machines when generating migration recommendations and come up with alternative solutions that provide cluster balance as well. Setting the automation level to partially or manual will offer you better initial placement recommendations and a more simplified maintenance mode process, but will create the risk of unbalance or resource starvation when the DRS tab in vCenter is left unmonitored.

To which host-level latency statistic is the SIOC congestion threshold related?

July 23, 2012 by frankdenneman

Today someone asked if the congestion threshold of SIOC is related to which host latency threshold? Is it the Device average (DAVG), Kernel Average (KAVG) or Guest Average (GAVG)?

Well actually it’s none of the above. DAVG, KAVG and GAVG are metrics in a host-local centralized scheduler that has complete control over all the requests to the storage system. SIOC main purpose is to manage shared storage resources across ESXi hosts, providing allocation of I/O resources independent of the placement of virtual machines accessing the shared datastore. And because it needs to regulate and prioritize access to shared storage that spans multiple ESXi hosts, the congestion threshold is not measured against a host-side latency metric. But to which metric is it compared? In essence the congestion threshold is compared with the weighted average of D/AVG per host, the weight is the number of IOPS on that host. Let’s expand on this a bit further.
Average I/O latency
To have an indication of the load of the datastore on the array, SIOC uses the average I/O latency detected by each host connected to that datastore. Average latency across hosts is used to cope with the variety of workloads, the characteristic of the active workloads, such as read versus writes, I/O size and degree of sequential I/Os in addition to array behavior such as block location, caching policies and I/O scheduling.
To calculate and normalize the average latency across hosts, each host writes its average device latency and number of I/Os for that datastore in a file called IORMSTATS.SF stored on the same datastore.

A common misconception about SIOC is that it’s compute cluster based. The process of determining the datastore-wide average latency really reveals the key denominator – hosts connected to the datastore – . All hosts connected to the datastore write to the IORMSTATS.SF file, regardless of cluster membership. Other than enabling SIOC, vCenter is not necessary for normal operations. Each connected host reads the IORMSTATS.SF file each 4 seconds and locally computes the datastore-wide average to use for managing the I/O stream. Therefor cluster membership is irrelevant.
Datastore wide normalized I/O latency
Back to the process of computing the datastore wide normalized I/O latency. The average device latencies of each host are normalized by SIOC based on the I/O request size. As mentioned before, not all storage related workloads are the same. Workloads issuing I/Os with a large request size result in longer device latencies due to way storage arrays process these workloads. For example, when using a larger I/O request size such as 256KB, the transfer might be broken up by the storage subsystem into multiple 64KB blocks. This operation can lead to a decline of transfer rate and throughput levels, increasing latency. This allows SIOC to differentiate high device latency from actual I/O congestion at the device itself.
Number of I/O requests complete per second
At this point SIOC has normalized the average latency across hosts based on I/O size, next step is to determine the aggregate number of IOPS accessing the datastore. As each host reports the number of I/O requests complete per second, this metric is used to compare and prioritize the workloads.
I hope this mini-deepdive into the congestion thresholds explains why the congestion threshold could never be solely related to a single host-side metric . Because the datastore-wide average latency is a normalized value, the latency observed of the datastore per individual host may be different than the latency SIOC reports per datastore.
.

Removing the horizontal bar in the footer of a word doc

July 20, 2012 by frankdenneman

Now for something completely different, a tip how to extend your life with about 5 years – or how to remove the horizontal bar in the footer of a word document.
Unfortunately I have to deal with the mark-up of word documents quite frequently and am therefor exposed to the somewhat unique abilities of the headers and footers feature of MS-Word. During the edit process of the upcoming book, Word voluntarily added a horizontal bar to my footer. Example depicted below.

However word doesn’t allow you to highlight and select a horizontal bar and therefor cannot be easily removed by pressing the delete button.
This means you have to explore the fantastic menu of word.
To remove the bar:
1. Open the footers section, by clicking in that area in the document.
2. Go to menu option Format
3. Borders and Shading
4. The borders and shading menu shows the line that miraculous appeared in my footer, by selecting the option None at the right side of the window it removes the horizontal bar from the footer.

5. Click OK
I hope this short tip helps you to keep the frustration to a minimum.

Disabling MinGoodness and CostBenefit

July 9, 2012 by frankdenneman

Over the last couple of months I’ve seen recommendations popping up on changing the MinGoodNess and CostBenefit settings to zero on a DRS cluster (KB1017291) . Usually after the maintenance window, when hosts where placed in maintenance mode, the hosts remain unevenly loaded and DRS won’t migrate virtual machines to the less loaded host.
By disabling these adaptive algorithms, DRS to consider every move and the virtual machines will be distributed aggressively across the hosts. Although this sounds very appealing, MinGoodness and CostBenefit calculations are created for a reason. Let’s explore the DRS algorithm and see why this setting should only be used temporarily and not as a permanent setting.

DRS load balance objectives
DRS primary objective is to provide virtual machines their required resources. If the virtual machine is getting the resources it request (dynamic entitlement), than there is no need to find a better spot. If the virtual machines do not get their resources specified in their dynamic entitlement, then DRS will consider moving the virtual machine depending on additional factors.
This means that DRS allow certain situations where the administrator feels like the cluster is unbalanced, such as an uneven virtual machine count on hosts inside the cluster. I’ve seen situations where one host was running 80% of the load while the other hosts where running a couple of virtual machines. This particular cluster was comprised of big hosts, each containing 1TB memory while the entire virtual machine memory footprint was no more than 800GB. One host could easily run all virtual machines and provide the resources the virtual machines were requesting.
This particular scenario describes the biggest misunderstanding of DRS, DRS is not primarily designed to equally distribute virtual machines across hosts in the cluster. It distributes the load as efficient as possible across the resources to provide the best performance of the virtual machines. And this is the key to understand why DRS does or does not generate migration recommendation. Efficiency! To move virtual machines around, it cost CPU cycles, memory resources and to a smaller extent datastore operation (stun/unstun) virtual machines. In the most extreme case possible, load balancing itself can be a danger to the performance of virtual machines by withholding resources from the virtual machines, by using it to move virtual machines. This is worst-case scenario, but the main point is that the load balancing process cost resources that could also be used by virtual machines providing their services, which is the primary reason the virtual infrastructure is created for. To manage and contain the resource consumption of load balancing operations, MinGoodness and CostBenefit calculations were created.
CostBenefit
DRS calculates the Cost Benefit (and risk) of a move. Cost: How many resources does it take to move a virtual machine by vMotion? A virtual machine that is constantly updating its large memory footprint cost more CPU cycles and network traffic than a virtual machine with a medium memory footprint that is idling for a while. Benefit: how many resources will it free up on the source host and what will the impact be on the normalized entitlement on the destination host? The normalized entitlement is the sum of dynamic entitlement of all the virtual machines running on that host divided by the capacity of the host. Risk is predicted how the workload might change on both the source and destination host and if the outcome of the move of the candidate virtual machine is still positive when the workload changes.
MinGoodness
To understand which host the virtual machine must move to, DRS uses the normalized entitlement of the host as the key metric and will only consider hosts that have a lower normalized entitlement than the source host. MinGoodness helps DRS understand what effect the move has on the overall cluster imbalance.
DRS awards every move a CostBenefit and MinGoodness rating and these are linked together. DRS will only recommend a move with a negative CostBenefit rating if the move has a highly positive MinGoodness rating. Due to the metrics used, CostBenefit ratings are usually more conservative than the MinGoodness ratings. Overpowering the decision to move virtual machine to host with a lower normalized entitlement due to the cost involved or risk of that particular move.
When MinGoodness and CostBenefit are set to zero, DRS calculates the cluster imbalance and recommend any move* that increases the balance of the normalized entitlement of each host within the cluster without considering the resource cost involved. In oversized environments, where resource supply is abundant, setting these options temporarily should not create a problem. In environments where resource demand rivals resource supply, setting these options can create resource starvation.
*The number of recommendations are limited to the MaxMovesPerHost calculation. This article contains more information about MaxMovesPerHost.
Recommendation
My recommendation is to use this advanced option sparingly, when host-load is extremely unbalanced and DRS does not provide any migration recommendation. Typically when the hosts in the cluster were placed in maintenance mode. Permanently activating this advanced option is similar to lobotomizing the DRS load balancing algorithm, this can do more harm in the long run as you might see virtual machines in an almost-constant state of vMotion.