FDM IN MIXED ESX AND VSPHERE CLUSTERS
Last couple of weeks I’ve been receiving questions about vSphere HA FDM agent in a mixed cluster. When upgrading vCenter to 5.0, each HA cluster will be upgraded to the FDM agent. A new FDM agent will be pushed to each ESX server. The new HA version supports ESX(i) 3.5 through ESXi 5.0 hosts. Mixed clusters will be supported so not all hosts have to be upgraded immediately to take advantage of the new features of FDM. Although mixed environments are supported we do recommend keeping the time you run difference versions in a cluster to a minimum. The FDM agent will be pushed to each hosts, even if the cluster contains identically configured hosts, for example a cluster containing only vSphere 4.1 update 1 will still be upgraded to the new HA version. The only time vCenter will not push the new FDM agent to a host if the host in question is a 3.5 host without the required patch. When using clusters containing 3.5 hosts, it is recommended to upgrade the ESX host to ESX350-201012401-SG PATCH (ESX 3.5) or ESXe350-201012401-I-BG PATCH (ESXi) patch first before upgrading vCenter to vCenter 5.0. If you still get the following error message: Host ’’ is of type ( ) with build , it does not support vSphere HA clustering features and cannot be part of vSphere HA clusters. Visit the VMware knowledgebase article: 2001833.
PARTIALLY CONNECTED DATASTORE CLUSTERS
The first article in the series about architecture and design decisions series focuses on the connectivity of the datastores within the datastore cluster. Connectivity between ESXi hosts and datastores in the datastore cluster affects initial placement and load balancing decisions made by DRS and Storage DRS. Although connecting a datastore to all ESXi hosts inside a cluster is a common practice, we still come across partially connected datastores in virtual environments. What is the impact of a partially connected datastore, member of a datastore cluster, connected to a DRS cluster? What interoperability problems can you expect and what is the impact of this design on DRS load balancing operations and SDRS load balancing operations? Let’s start with the basic terminology.
ARCHITECTURE AND DESIGN OF DATASTORE CLUSTERS
Storage DRS extends the DRS feature set to the storage space. The primary element used by SDRS is a datastore cluster. Introducing the concept of datastore clusters can affect or shift the paradigm of storage management in virtual infrastructures. This article is the start of a short series of articles focusing on the design considerations of datastore clusters Datastore cluster concept Let’s start with looking at the concept datastore cluster. Datastore clusters can be regarded as the equivalent of DRS clusters. A datastore cluster is the storage equivalent of an vCenter (DRS) cluster whereas a datastore is the equivalent of a ESXi host. As datastore clusters pool storage resources into one single logical pool it becomes a management object. This storage pooling allows the administrator to manage many individual datastores as one element, and depending on the enabled SDRS features, providing optimized usage of storage capacity and IO performance capability off all member datastores. SDRS settings are configured at datastore cluster level and are applied to each member datastore inside the datastore cluster. When SDRS is enabled the datastore cluster it becomes the storage load-balancing domain, requiring administrators and architects to treat the datastore cluster as a single entity for decision making instead of individual datastores. Datastore Clusters architecture and design Although datastore clusters offers an abstraction layer, one must keep in mind the relationship between existing objects like hosts, clusters, virtual machine and virtual disks. This new abstraction layer might even disrupt existing (organizational) processes and policies. Introducing datastore clusters can have impact on various design decisions such as VMFS datastores sizing, configuration of the datastore clusters, the variety in datastore clusters and the number of datastore clusters in the virtual infrastructure. In this series I will address these considerations more in depth. Stay tuned for the first part; the impact of connectivity of datastores in a datastore cluster. More articles in the architecting and designing datastore clusters series: Part2: Partially connected datastore clusters. Part3: Impact of load balancing on datastore cluster configuration. Part4: Storage DRS and Multi-extents datastores. Part5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.
SDRS OUT OF SPACE AVOIDANCE
During VMworld I noticed a lot of focus of the attendees was on the IO load balancing features of Storage DRS (SDRS), however SDRS is more than only IO load balancing. Both space load balancing feature and initial placement are just as incredible, powerful and as useful as IO load balancing. Actually the term space load balancing isn’t really doing the algorithm any justice as it sounds it makes “unnecessary” moves around space usage, whereas “out of space avoidance” suits more the nature of this SDRS algorithm, because it will make crucial recommendations that in my opinion bring a lot of value. Initial placement and IO load balancing will be featured in future articles, but in this post I would like to focus on the out of space avoidance feature of SDRS. When SDRS is enabled, it will automatically make recommendations based on space utilization and IO load. IO load balancing can be activated or deactivated when enabling or disabling the option “Enable I/O metric for SDRS recommendations”. SDRS does not offer the option to enable or disable out of space avoidance; out of space avoidance is enabled by default and can only be disabled by disabling SDRS in its entirety. Thresholds By default, SDRS monitors the datastore space utilization and generates migration recommendations if the datastore utilization is exceeding the “space utilization ratio threshold”. The utilized space threshold determines the maximum acceptable space load of the VMFS datastore. And this is a part of the SDRS settings of the datastore cluster. This threshold is set by default to 80% and can be set to any value between 50 and 100 percent. [caption id=“attachment_1715” align=“aligncenter” width=“619” caption=“Space utilization ratio threshold”][/caption] Be aware that this threshold applies to each datastore that is a member of the datastore cluster; if you want to have similar absolute space headroom, it is recommended to add similar sized datastores to the datastore cluster. If the threshold is reached, SDRS will not migrate a random virtual machine to a random datastore, it needs to adhere to certain rules. Besides running a cost-benefit analysis on the registered virtual machines in the datastore, it also takes the “space utilization ratio difference threshold” into account. [caption id=“attachment_1718” align=“aligncenter” width=“619” caption=“Space utilization ratio difference threshold”][/caption] The utilization difference setting allows SDRS to determine which datastores should be considered as destination for virtual machines migrations. The space utilization ratio difference threshold indicates the required difference of utilization ratio between the destination and source datastores. The difference threshold is an advanced option of the SDRS runtime rules and is set to a default value of 5%. Consequently SDRS will not move any virtual machine disk from an 83% utilized datastore to a 78% utilized datastore. The reason why SDRS uses this setting is to avoid recommending migrations of marginal value. SDRS also uses space growth rate to avoid risky migrations. A migration is considered risky if it has to be undone in the near future. SDRS defines near future as a time window that is longer than the lead-time of a storage vMotion and defaults to 30 hours. This option cannot be changed in any supported way. Hence, SDRS will avoid moving any virtual machine disk to a datastore that is expected, based on growth rate, to exceed the utilization threshold within the next 30 hours. Space Utilization How does SDRS determine if VMFS datastore has exceeded the threshold? It does this by comparing the “Space utilization” against the utilized space threshold. Space utilization is determined by dividing the total consumed space on the datastore by the datastore capacity.
MEM.MINFREEPCT SLIDING SCALE FUNCTION
One of the cool “under the hood” improvements vSphere 5 offers is the sliding scale function of the Mem.MinFreePct. Before diving into the sliding scale function, let’s take a look at the Mem.MinFreePct function itself. MinFreePct determines the amount of memory the VMkernel should keep free. This threshold is subdivided in various memory thresholds, i.e. High, Soft, Hard and Low and is introduced to prevent performance and correctness issues. The threshold for the low state is required for correctness. In other words, it protects the VMkernel layer from PSOD’s resulting from memory starvation. The soft and hard thresholds are about virtual machine performance and memory starvation prevention. The VMkernel will trigger more drastic memory reclamation techniques when it approaches the Low state. If the amount of free memory is just a bit less than the Min.FreePct threshold, the VMkernel applies ballooning to reclaim memory. The ballooning memory reclamation technique introduces the least amount of performance impact on the virtual machine by working together with the Guest operating system inside the virtual machine, however there is some latency involved with ballooning. Memory compressing helps to avoid hitting the low state without impacting virtual machine performance, but if memory demand is higher than the VMkernels’ ability to reclaim, drastic measures are taken to avoid memory exhaustion and that is swapping. However swapping will introduce VM performance degradations and for this reason this reclamation technique is used when desperate moments require drastic measurements. For more information about reclamation techniques I recommend reading the “disable ballooning” article. vSphere 4.1 allowed the user to change the default MinFreePct value of 6% to a different value and introduced a dynamic threshold of the Soft, Hard and Low state to set appropriate thresholds and prevent virtual machine performance issues while protecting VMkernel correctness. By default vSphere 4.1 thresholds was set to the following values:
UPGRADING VMFS DATASTORES AND SDRS
Among many new cool features introduced by vSphere 5 is the new VMFS file system for block storage. Although vSphere 5 can use VMFS-3, VMFS-5 is the native VMFS level of vSphere 5 and it is recommended to migrate to the new VMFS level as soon as possible. Jason Boche wrote about the difference between VMFS-3 and VMFS-5. vSphere 5 offers a pain free upgrade path from VMFS-3 to VMFS-5. The upgrade is an online and non-disruptive operation which allows the resident virtual machines to continue to run on the datastore. But upgraded VMFS datastores may have impact on SDRS operations, specifically virtual machine migrations. When upgrading a VMFS datastore from VMFS-3 to VMFS-5, the current VMFS-3 block size will be maintained and this block size may be larger than the VMFS-5 block size as VMFS-5 uses unified 1MB block size. For more information about the difference between native VMFS-5 datatstores and upgraded VMFS-5 datastore please read: Cormac’s article about the new storage features Although the upgraded VMFS file system leaves the block size unmodified, it removes the maximum file size related to a specific block size, so why exactly would you care about having a non-unified block size in your SDRS datastore cluster? In essence, mixing different block sizes in a datastore cluster may lead to a loss in efficiency and an increase in the lead time of a storage vMotion process. As you may remember, Duncan wrote an excellent post about the impact of different block sizes and the selection of datamovers. To make an excerpt, vSphere 5 offers three datamovers: • fsdm • fs3dm • fs3dm – hardware offload The following diagram depicts the datamover placement in the stack. Basically, the longer path the IO has to travel to be handled by a datamover, the slower the process. In the most optimal scenario, you want to leverage the VAAI capabilities of your storage array. vSphere 5 is able to leverage the capabilities of the array allowing hardware offload of the IO copy. Most IOs will remain within the storage controller and do not travel up the fabric to the ESXi host. But unfortunately not every array is VAAI capable. If the attached array is not VAAI capable or enabled, vSphere will leverage the FS3DM datamover. FS3DM was introduced in vSphere 4.1 and contained some substantial optimizations so that data does not travel through all stacks. However if a different block size is used, ESXi reverts to FSDM, commonly known as the legacy datamover. To illustrate the difference in Storage vMotion lead time, read the following article (once again) by Duncan: Storage vMotion performance difference. This article contains the result of a test in which a virtual machine was migrated between two different types of disks configured with deviating block sizes and at a different stage a similar block size. To emphasize; the results illustrates the lead time of the FS3DM datamover and the FSDM datamover. The results below are copied from the Yellow-Bricks.com article:
MULTI-NIC VMOTION SUPPORT IN VSPHERE 5.0
There are some fundamental changes to vMotion scalability and performance in vSphere 5.0 one is the multi-nic support. One of the most visible changes is multi-NIC vMotion capabilities. In vSphere 5.0 vMotion is now capable of using multiple NICs concurrently to decrease lead time of a vMotion operation. With multi-NIC support even a single vMotion can leverage all of the configured vMotion NICs, contrary to previous ESX releases where only a single NIC was used. Allocating more bandwidth to the vMotion process will result in faster migration times, which in turn affects the DRS decision model. DRS evaluates the cluster and recommends migrations based on demand and cluster balance state. This process is repeated each invocation period. To minimize CPU and memory overhead, DRS limits the number of migration recommendations per DRS invocation period. Ultimately, there is no advantage recommending more migrations that can be completed within a single invocation period. On top of that, the demand could change after an invocation period that would render the previous recommendations obsolete. vCenter calculates the limit per host based on the average time per migration, the number of simultaneous vMotions and the length of the DRS invocation period (PollPeriodSec). PollPeriodSec: By default, PollPeriodSec – the length of a DRS invocation period – is 300 seconds, but can be set to any value between 60 and 3600 seconds. Shortening the interval will likely increase the overhead on vCenter due to additional cluster balance computations. This also reduces the number of allowed vMotions due to a smaller time window, resulting in longer periods of cluster imbalance. Increasing the PollPeriodSec value decreases the frequency of cluster balance computations on vCenter and allows more vMotion operations per cycle. Unfortunately, this may also leave the cluster in a longer state of cluster imbalance due to the prolonged evaluation cycle. Estimated total migration time: DRS considers the average migration time observed from previous migrations. The average migration time depends on many variables, such as source and destination host load, active memory in the virtual machine, link speed, available bandwidth and latency of the physical network used by the vMotion process. Simultaneous vMotions: Similar to vSphere 4.1, vSphere 5 allows you to perform 8 concurrent vMotions on a single host with 10GbE capabilities. For 1GbE, the limit is 4 concurrent vMotions. Design considerations When designing a virtual infrastructure leveraging converged networking or Quality of Service to impose bandwidth limits, please remember that vCenter determine the vMotion limits based on the vMotion uplink physical NIC reported link speed. In other words, if the physical NIC reports at least 10GbE, link speed, vCenter allows 8 vMotions, but if the physical NIC reports less than 10GBe, but at least 1 GbE, vCenter allows a maximum of 4 concurrent vMotions on that host. For example; HP Flex technology sets a hard limit on the flexnics, resulting in the reported link speed equal or less to the configured bandwidth on Flex virtual connect level. I’ve come across many Flex environments configured with more than 1GB bandwidth, ranging between 2GB to 8GB. Although they will offer more bandwidth per vMotion process, it will not offer an increase in the amount of concurrent vMotions. Therefore, when designing a DRS cluster, take the possibilities of vMotion into account and how vCenter determines the concurrent number of vMotion operations. By providing enough bandwidth, the cluster can reach a balanced state more quickly, resulting in better resource allocation (performance) for the virtual machines. **disclaimer: this article contains out-takes of our book: vSphere 5 Clustering Technical Deepdive**
BLACK AND WHITE EDITION CLUSTERING DEEPDIVE AVAILABLE
It looks like Amazon is getting its game together. As of now the Black and White paperback edition is available at Amazon.com. Get it here: VMware vSphere Clustering Technical Deepdive We are still waiting for the Full color edition to become available, but hey it’s a start :)
VMWARE VSPHERE 5 CLUSTERING TECHNICAL DEEPDIVE
As of today the paperback versions of the VMware vSphere 5 Clustering Technical Deepdive is available at Amazon. We took the feedback into account when creating this book and are offering a Full Color version and a Black and White edition. Initially we planned to release an Ebook and a Full Color version only, but due to the high production cost associated with Full color publishing, we decided to add a Black and White edition to the line-up as well. At this stage we do not have plans to produce any other formats. As this is self-publishing release we developed, edited and created everything from scratch. Writing and publishing a book based on new technology has serious impact on one’s life, reducing every social contact to a minimum even family life. As of this, our focus is not on releasing additional formats such as ibooks or Nook at this moment. Maybe at a later stage but VMworld is already knocking on our doors, so little time is left to spend some time with our families. When producing the book, the page count rapidly exceeded 400 pages using the 4.1 HA and DRS layout. As many readers told us they loved the compactness of the book therefor our goal was to keep the page count increase to a minimum. Adjusting the inner margins of the book was the way to increase the amount of space available for the content. A tip for all who want to start publishing, start with getting accustomed to publisher jargon early in the game, this will save you many failed proof prints! We believe we got the right balance between white-space and content in the book, reducing the amount of pages while still offering the best reading experience. Nevertheless the number of pages grew from 219 to 348. While writing the book, we received a lot of help and although Duncan listed all the people in his initial blog, I want to use take a moment to thank them again. First of all I want to thank my co-author Duncan for his hard work creating content, but also spending countless hours on communication with engineering and management. Anne Holler - DRS and SDRS engineer – Anne really went out of her way to help us understand the products. I frequently received long and elaborate replies regardless of time and day. Thanks Anne! Next up is Doug – its number Frank not amounts! – Baer. I think most of the time Doug’s comments equaled the amount of content inside the documents. Your commitment to improve the book impressed us very much. Gabriel Tarasuk-Levin for helping me understand the intricacies of vMotion. A special thanks goes out to our technical reviewers and editors: Keith Farkas and Elisha Ziskind (HA Engineering), Irfan Ahmad and Rajesekar Shanmugam (DRS and SDRS Engineering), Puneet Zaroo (VMkernel scheduling), Ali Mashtizadeh and Doug Fawley and Divya Ranganathan (EVC Engineering). Thanks for keeping us honest and contributing to this book. I want to thank VMware management team for supporting us on this project. Doug “VEEAM” Hazelman thanks for writing the foreword! Availability This weekend Amazon made both the black and white edition and the full color edition available. Amazon list the black and white edition as: VMware vSphere 5 Clustering Technical Deepdive (Volume 2) [Paperback], whereas the full color edition is listed with Full Color in its subtitle. Or select the following links to go the desired product page: Black and white paperback $29.95 Full Color paperback $49.95 For people interested in the ebook: VMware vSphere 5 Clustering Technical Deepdive (price might vary based on location) If you prefer a European distributor, ComputerCollectief has both books available: Black and White edition: http://www.comcol.nl/detail/74615.htm Full Color edition: http://www.comcol.nl/detail/74616.htm Pick it up, leave a comment and of course feel free to make those great mugshots again and ping them over via Facebook or our Twitter accounts! For those looking to buy in bulk (> 20) contact clusteringdeepdive@gmail.com.
AMAZON INDEXING PROBLEMS
The entire week Amazon hasn’t been able to index both vSphere 5 Clustering technical deepdive editions properly. We are working with Createspace to fix these problems. In the meantime, both full color and black and white editions can be ordered at Createspace: Black and White: https://www.createspace.com/3641804 $29.95 Full Color: https://www.createspace.com/3586911 $49.95 An update follows as soon as Amazon list the paperbacks.