Considerations when modifying the individual VM automation level
Recently I received some questions about the behavior of DRS when the automation level of an individual virtual machine is modified. DRS allows customization of the automation levels for individual virtual machines to override the DRS cluster automation level. The most common reason for modifying the automation level is to prevent DRS move a virtual machine automatically. Selecting an automation level mode other than the default cluster automation level or fully automated impacts (daily) operational procedures. It might impact cluster balance and/or resource availability if the operational procedures are not adjusted to align with the “new” behavior of DRS when dealing with non-default automation levels. Before continuing with the impact and caveats of a non-default automation level, let’s zoom into their behavior.
Level of automation
There are five automation level modes:
• Fully Automated
• Partially Automated
Each automation level behaves differently:
|Automation level||Initial placement||Load Balancing|
|Fully Automated||Automatic Placement||Automatic execution of migration recommendation|
|Partially Automated||Automatic Placement||Migration recommendation is displayed|
|Manual||Recommended host is displayed||Migration recommendation is displayed|
|Disabled||VM powered-on on registered host||No migration recommendation generated|
The default automation level is not listed in the table above as it aligns with the cluster automation level. When the automation level of the cluster is modified, the individual automation level is modified as well.
Disabled automation level
If the automation level of a virtual machine is set to disabled, then DRS is disabled entirely for the virtual machine. DRS will not generate a migration recommendation or generate an initial placement recommendation. The virtual machine will be powered-on on its registered host. A powered-on virtual machine with its automation level set to disabled will still impact the DRS load balancing calculation as its consumes cluster resources. During the recommendation calculation, DRS ignores the virtual machines set to disabled automation level and selects other virtual machines on that host. If DRS must choose between virtual machines set to the automatic automation levels and the manual automation level, DRS chooses the virtual machines set to automatic as it prefers them over virtual machines set to manual.
Manual automation level
When a virtual machine is configured with the manual automation level, DRS generate both initial placement and load balancing migration recommendations, however the user needs to manual approve these recommendations.
Partially automation level
DRS automatically places a virtual machine with a partially automation level, however it will generate a migration recommendation which requires manual approval.
The impact of manual and partially automation level on cluster load balance
When selecting any other automation level than disabled, DRS assumes that the user will manual apply the migration recommendation it recommends. This means that DRS will continue to include the virtual machines in the analysis of cluster balance and resource utilization. During the analysis DRS simulates virtual machine moves inside the cluster, every virtual machine that is not disabled will be included in the selection process of migration recommendations. If a particular move of a virtual machine offers the highest benefit and the least amount of cost and lowest risk, DRS generates a migration recommendation for this move. Because DRS is limited to a specific number of migrations, it might drop a recommendation of a virtual machine that provide almost similar goodness. Now the problem with this scenario is, that the recommended migration might be a virtual machine configured with a manual automation level, while the virtual machine with near-level goodness is configured with the default automation level. This should not matter if the user monitors each and every DRS invocation and reviews the migration recommendations when issued. This is unrealistic to expect as DRS runs each 5 minutes.
I’ve seen a scenario where a group of the virtual machines where configured with manual mode. It resulted in a host becoming a “trap” for the virtual machines during an overcommitted state. The user did not monitor the DRS tab in vCenter and was missing the migration recommendations. This resulted in resource starvation for the virtual machines itself but even worse, it impacted multiple virtual machines inside the cluster. Because DRS generated migration recommendations, it dropped other suitable moves and could not achieve an optimal balance.
Disabled versus partially and manual automatic levels
Disabling DRS on a virtual machines have some negative impact on other operation processes or resource availability, such as placing a host into maintenance mode or powering up a virtual machine after maintenance itself. As it selects the registered host, it might be possible that the virtual machine is powered on a host with ample available resources while more suitable hosts are available. However disabled automation level avoids the scenario described in the previous paragraph.
Partially automatic level automatically places the virtual machine on the most suitable host, while manual mode recommends placing the host on the most suitable host available. Partially automated offers the least operational overhead during placement, but can together with manual automation level introduce lots of overhead during normal operations.
Risk versus reward
Selecting an automation level is almost a risk versus reward game. Setting the automation level to disabled might impact some operation procedures, but allows DRS to neglect the virtual machines when generating migration recommendations and come up with alternative solutions that provide cluster balance as well. Setting the automation level to partially or manual will offer you better initial placement recommendations and a more simplified maintenance mode process, but will create the risk of unbalance or resource starvation when the DRS tab in vCenter is left unmonitored.