Storage DRS Archives - Page 10 of 10

Partially connected datastore clusters

October 7, 2011 by frankdenneman

The first article in the series about architecture and design decisions series focuses on the connectivity of the datastores within the datastore cluster. Connectivity between ESXi hosts and datastores in the datastore cluster affects initial placement and load balancing decisions made by DRS and Storage DRS. Although connecting a datastore to all ESXi hosts inside a cluster is a common practice, we still come across partially connected datastores in virtual environments.
What is the impact of a partially connected datastore, member of a datastore cluster, connected to a DRS cluster? What interoperability problems can you expect and what is the impact of this design on DRS load balancing operations and SDRS load balancing operations?
Let’s start with the basic terminology.

Fully connected datastore clusters
A fully connected datastore cluster is when the storage is attached to all ESX servers in a cluster. This is a recommendation, but it is not enforced.
Partially connected datastore clusters
If a datastore is connected to a subset of ESXi hosts inside the DRS cluster, the datastore cluster is treated as a partially connected datastore cluster.

Now what happens if the DRS cluster is connected to partially connected datastores? It’s important to understand that the goal of both DRS and SDRS is resource availability, key to offering resource availability is to provide or have as much as mobility as possible. SDRS will not generate any migration recommendations that will reduce the compatibility of a virtual machine regarding datastore connections. Virtual machine to host compatibility are captured in compatibility lists.
Compatibility list
Inside the cluster a vm-host compatibility list is generated for each virtual machine. The compatibility list determines which ESXi host in the cluster have network and storage configurations that allow the virtual machine to successfully come online. Membership of a Mandatory VM to host affinity rules are also listed in the compatibility list.If the network portgroup or datastore is not available on the host, or the host is not listed in the host group of the mandatory affinity rule, the ESXi server is deemed incompatible to host that virtual machine.
As mentioned, both DRS and SDRS focus on resource availability and resource outage avoidance, therefore SDRS prefers a datastore that is connected to all hosts rather than selecting a datastore that is partially connected. Connecting datastores to a subset of hosts reduce the compatibility list impacting the mobility of the virtual machine reducing the efficiency of DRS and SDRS.
Finding a suitable location or the ability to load balance becomes more challenging when the cluster and datastore cluster are partially connected. During initial placement a selection of a datastore may impact the mobility of the virtual machine amongst the hosts, while selecting a host impacts the mobility of a virtual machines amongst the datastores in the datastore cluster.

VM mobility in partially connected datastore clusters

Let’s explore this impact a little bit further. During the process of migration recommendations, DRS selects a host for a virtual machine that can provide enough resources to satisfy the virtual machines resource entitlement, while lowering the imbalance of the cluster. DRS might come across a low utilized host; other hosts inside the cluster are highly utilized. Unfortunately the lightly utilized host is not connected to the datastore containing the virtual machine files (it might even be lowly utilized due to the poor connection state) and therefore DRS will not consider the host due to the incompatibility. While from a DRS resource load balancing perspective this host might be very attractive option to solve resource imbalance. Also keep in mind the impact of this behavior on VM-Host affinity rules, DRS will not migrate the virtual machine to the partially connected host inside the host group.
Similar happens with SDRS load balancing. Partially connected datastores are not recommended when fully connected datastores are available that do not violate the space SDRS threshold. You might wonder why the space SDRS threshold is explicitly mentioned and not the IO load balanced but that’s because IO load balancing is disabled when a partially connected datastore is detected in the datastore cluster.
IO load balancing
It is important to understand the impact a single partially connected datastore has on the service level of an entire datastore cluster. As SDRS detects a partially connected datastore it will disable the IO load balancing on the entire datastore cluster. Not only on that single partially connected datastore, but the entire cluster. Effectively degrading a complete feature set of your virtual infrastructure.
Temporary partially connectivity – a real threat?
The connectivity status is important when the SDRS interval expires; during the migration recommendation calculation is checks the connectivity. A temporary all-paths-down status or a rezoning procedure might not have effect on SDRS load-balancing behavior, but what if good old murphy decides to give you a visit during the invocation period? Keep this behavior in mind when scheduling maintenance on the storage platform.
Warning messages
SDRS generates a warning and displays it at the SDRS faults tab in the datastores and datastore cluster view

Benefits of partially connected
We cannot identify any direct benefit of partially connecting a datastore of a cluster. Partially connected datastores impact initial placement, disable IO load-balancing and will affect DRS load balancing as well as SDRS space balancing. Therefore a basic design decision would be connect all datastores to all host in the cluster connected to the datastore cluster. If anyone has got a good reason for not connecting a datastore to all the hosts, please leave a comment.

Architecture and design of Datastore clusters

September 16, 2011 by frankdenneman

Storage DRS extends the DRS feature set to the storage space. The primary element used by SDRS is a datastore cluster. Introducing the concept of datastore clusters can affect or shift the paradigm of storage management in virtual infrastructures. This article is the start of a short series of articles focusing on the design considerations of datastore clusters
Datastore cluster concept
Let’s start with looking at the concept datastore cluster. Datastore clusters can be regarded as the equivalent of DRS clusters. A datastore cluster is the storage equivalent of an vCenter (DRS) cluster whereas a datastore is the equivalent of a ESXi host. As datastore clusters pool storage resources into one single logical pool it becomes a management object. This storage pooling allows the administrator to manage many individual datastores as one element, and depending on the enabled SDRS features, providing optimized usage of storage capacity and IO performance capability off all member datastores.
SDRS settings are configured at datastore cluster level and are applied to each member datastore inside the datastore cluster. When SDRS is enabled the datastore cluster it becomes the storage load-balancing domain, requiring administrators and architects to treat the datastore cluster as a single entity for decision making instead of individual datastores.
Datastore Clusters architecture and design
Although datastore clusters offers an abstraction layer, one must keep in mind the relationship between existing objects like hosts, clusters, virtual machine and virtual disks. This new abstraction layer might even disrupt existing (organizational) processes and policies. Introducing datastore clusters can have impact on various design decisions such as VMFS datastores sizing, configuration of the datastore clusters, the variety in datastore clusters and the number of datastore clusters in the virtual infrastructure.
In this series I will address these considerations more in depth. Stay tuned for the first part; the impact of connectivity of datastores in a datastore cluster.
More articles in the architecting and designing datastore clusters series:
Part2: Partially connected datastore clusters.
Part3: Impact of load balancing on datastore cluster configuration.
Part4: Storage DRS and Multi-extents datastores.
Part5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.

SDRS out of space avoidance

September 13, 2011 by frankdenneman

During VMworld I noticed a lot of focus of the attendees was on the IO load balancing features of Storage DRS (SDRS), however SDRS is more than only IO load balancing. Both space load balancing feature and initial placement are just as incredible, powerful and as useful as IO load balancing.
Actually the term space load balancing isn’t really doing the algorithm any justice as it sounds it makes “unnecessary” moves around space usage, whereas “out of space avoidance” suits more the nature of this SDRS algorithm, because it will make crucial recommendations that in my opinion bring a lot of value.
Initial placement and IO load balancing will be featured in future articles, but in this post I would like to focus on the out of space avoidance feature of SDRS. When SDRS is enabled, it will automatically make recommendations based on space utilization and IO load. IO load balancing can be activated or deactivated when enabling or disabling the option “Enable I/O metric for SDRS recommendations”. SDRS does not offer the option to enable or disable out of space avoidance; out of space avoidance is enabled by default and can only be disabled by disabling SDRS in its entirety.
Thresholds
By default, SDRS monitors the datastore space utilization and generates migration recommendations if the datastore utilization is exceeding the “space utilization ratio threshold”. The utilized space threshold determines the maximum acceptable space load of the VMFS datastore. And this is a part of the SDRS settings of the datastore cluster. This threshold is set by default to 80% and can be set to any value between 50 and 100 percent.

Space utilization ratio threshold

Be aware that this threshold applies to each datastore that is a member of the datastore cluster; if you want to have similar absolute space headroom, it is recommended to add similar sized datastores to the datastore cluster.
If the threshold is reached, SDRS will not migrate a random virtual machine to a random datastore, it needs to adhere to certain rules. Besides running a cost-benefit analysis on the registered virtual machines in the datastore, it also takes the “space utilization ratio difference threshold” into account.

Space utilization ratio difference threshold

The utilization difference setting allows SDRS to determine which datastores should be considered as destination for virtual machines migrations. The space utilization ratio difference threshold indicates the required difference of utilization ratio between the destination and source datastores. The difference threshold is an advanced option of the SDRS runtime rules and is set to a default value of 5%. Consequently SDRS will not move any virtual machine disk from an 83% utilized datastore to a 78% utilized datastore. The reason why SDRS uses this setting is to avoid recommending migrations of marginal value.
SDRS also uses space growth rate to avoid risky migrations. A migration is considered risky if it has to be undone in the near future. SDRS defines near future as a time window that is longer than the lead-time of a storage vMotion and defaults to 30 hours. This option cannot be changed in any supported way. Hence, SDRS will avoid moving any virtual machine disk to a datastore that is expected, based on growth rate, to exceed the utilization threshold within the next 30 hours.
Space Utilization
How does SDRS determine if VMFS datastore has exceeded the threshold? It does this by comparing the “Space utilization” against the utilized space threshold. Space utilization is determined by dividing the total consumed space on the datastore by the datastore capacity.

Space utilization = total consumed space on the datastore / datastore capacity

To determine the space utilization of the datastore, SDRS requires the “per-VMDK” usage statistic and the VMFS datastore usage statistic. The per-VMDK statistic provides SDRS data about the allocated space in the VMDK, while the VMFS datastore statistic provides information about the datastore utilization.

Now this is one of the more cool parts of the algorithm, as mentioned, SDRS takes into account the allocated amount of disk space instead of the provisioned disk space using thin disk. SDRS receives the per-VMDK statistics and space utilization per datastore on an ongoing basis. If the utilization exceeds the threshold the SDRS algorithm is triggered immediately and does not have to wait to complete his invocation period.
By receiving space utilization information frequently, SDRS is able to understand and trend-map the data-growth within the VMDK. The growth rate is estimated using historical usage samples, with recent samples weighing more than older historical usage samples. By including this information in the cost-benefit risk analysis SDRS attempts to avoid migrating virtual machines with data-growth rates that will likely cause the destination datastore to exceed the threshold in the near future. By including the estimated growth rate, SDRS is equipped with an outage avoiding strategy. This avoiding outage intelligence helps most organization to adopt thin provisioned disks located in the virtualization stack. You still need to think about the over-subscription level, but SDRS will helps to control the environment and avoid outage caused by out-of-space situations as much as possible.
Migration candidate selection
Now how does SDRS know which virtual machines to move? As mentioned before SDRS uses a cost-benefit risk analysis. Moving virtual machines is expensive, both on CPU and memory subsystems as well as the IO subsystems. Therefore SDRS aims to generate recommendations that have the lowest impact on the environment while delivering improvements and solve any violation. SDRS considers the size of the VMDK (allocated space) and the activity of the IO workload to calculate the cost aspect of the CB analysis.
When a datastore exceeds the space utilization threshold, SDRS will try to move the number of megabytes out of the datastore to correct the space utilization violation. In other words, SDRS attempts to select a virtual machine that is closest in size required to bring the space utilization of the datastore to the space utilization ratio threshold.
Caveats
Before enabling SDRS on arrays configured to use deduplication or replication technologies you might want to check your array vendor on their recommendation of combining SDRS with their technology. Duncan has written an excellent article about the interop between SDRS and various technologies. Please read if you already haven’t: http://www.yellow-bricks.com/2011/07/15/storage-drs-interoperability/
Key Takeaway
SDRS out of space avoidance alone is reason enough to use SDRS. Having an automated way of distributing virtual machines across your datastore landscape can result in a more efficient usage of available storage space, while reducing the management effort. Understanding the outage avoidance measures inside the algorithm might help to consider using thin-provisioned VMDK format to decrease the footprint of the virtual machines, speed up SDRS migrations and possibly increase the VM density per datastore.