VSPHERE 4.1 HA AND DRS BOOK FOR ONLY $19.95
We lowered the price of the vSphere 4.1 HA and DRS technical Deepdive book permanently. As of this week you can obtain one of the coolest books in the virtualization section at Amazon for only $19.95. 30 5-star reviews couldn’t be wrong. Here is just a random selection of two of those 5-star reviews: B. Riley: The term “deepdive” is regularly abused in the technology world these days. There’s nothing more disheartening than walking into a one hour session at a conference entitled deepdive, and finding out that it’s neither deep, nor a dive. It ends up being more like sitting in a couple inches of warm water in a plastic kiddie pool. When these guys say deepdive, they mean it. This book is packed with helpful information from the first, to the last page. Somehow, they even manage to read minds. They know what you’re thinking as a VMware administrator, and they’ll tell you the why, and the best practice. Lots of books have good overviews of HA and DRS, but none goes as deep as this. It’s very well-written, and highly recommended for anyone who is running, or thinking about running an HA/DRS environment. This book is, as Jeremy Clarkson would say, “absolutely brilliant”!
IMPACT OF LOAD BALANCING ON DATASTORE CLUSTER CONFIGURATION
This article is a part of the series on architecture and design on datastore clusters. This article zooms in on why it’s recommended to use similar type disks in a datastore cluster. In-tier balancing solution SDRS can be considered as an “in-tier” balancing solution, suggesting that a datastore cluster should be populated with datastores that provide similar performance, continuity, capacity or service level. Although it’s not a technical requirement to have similar configured datastores, using heterogeneous configurations in a datastore cluster can lead to unexpected results. Understanding the SDRS’ main goal and the load balancing process can assist you in architecting your datastore cluster. SDRS load balancing goal The main focus of SDRS is to correct imbalance from both a space utilization and latency perspective on the datastore level. SDRS determines the imbalance level (space or latency) of the datastore cluster and migrates one or multiple virtual machine disk to solve the imbalance. In order to select an appropriate migration candidate (virtual machine) SDRS relies on device and workload modeling to understand the impact of a workload on the latency of the datastore, SDRS uses virtual machine statistics and datastore utilization to understand the impact of virtual machine placement on the space utilization of a datastore. Modeling Let’s take a closer look at modeling. SDRS captures device performance to create a performance model; by using the SIOC injector and a reference workload it understands and learns the performance of each device. This way SDRS gets a clear picture of the datastores inside the datastore cluster. Workload modeling is used by SDRS to understand and learn the virtual machine workloads inside the datastore cluster. The workload modeling process creates a workload metric of each virtual disk and analyzes the impact of the data points on latency. SDRS combines and correlates the outcome of device and workload modeling and space utilization into a unified recommendation. This means that when SDRS decides to migrate a specific VMDK, it considers the workload metric of the virtual disk and analyzes the impact of that specific workload on the latency of the destination datastore. If both IO metric and space utilization functions are enabled on the datastore cluster, SDRS combines the outcome of device modeling, workload modeling and space utilization and weights them regarding to violated threshold. Interesting enough, even when you disable IO load balancing, SDRS attempts to take overall IO statistics into account when finding a suitable datastore. Impact of load balancing construct on datastore cluster configuration Although SDRS analyzes devices and each virtual machines’ workload it’s is key to understand that SDRS’ main priority is to correct the threshold violation of datastore. Although it tries to find the best suitable datastore for a specific workload, modeling is still used as a metric to understand and achieve the goal of getting the best overall performance out of the datastore cluster. In other words, modeling is used for balancing the load on the datastores and not to respect specific wishes of a virtual machine disk. In one way you can argue that SDRS load balancing has somewhat of a socialistic nature. Benefit for the society (datastores inside a datastore cluster) outweighs the individual need (single virtual machine performance). Let’s look at an example to better understand this concept. Example scenario VM1 is running on a datastore1. SDRS determined that the normalized load* is 5ms latency. VM2 and VM3 are running on a datastore2. SDRS considers datastore2 to have a normalized load of 20ms latency, violating the default threshold of 15ms. Normalized load: SDRS aggregates the device modeling and workload modeling into a metric called normalized load. SDRS moves VM3 to datastore1; at this point the overall latency of the datastore2 is reduced 13ms. However due to moving VM3 to datastore1, the latency is increased from 5ms to 12ms. At this point the increase in latency will impact the workload of VM1, however the “society” benefits from the move because after the move no datastore is violating the latency threshold of the SDRS cluster anymore. In this scenario the overall IOPS will be higher, which aligns with the goal of SDRS utilizing overall capacity and performance. Note: As this subject is complex enough, I used a very simple example. In this scenario the latency “moved” with the VM. In real life this is not necessarily the fact, when a virtual machine is moved the latency will go up with the same amount at which the latency went down on the source. Load Balancing in a Heterogeneous configuration What if the datastore cluster contains a mix of datastores that are backed by different types of disks? For a moment, let’s focus on the performance impact of a heterogeneous configuration. As mentioned before, device and workload modeling helps SDRS to find the most suitable datastore for a specific workload, however when combining different types of disk, for example, SSD, FC and SATA, it is not uncommon to see the fastest datastore fill up first. If one of the smaller SSD’s run out of space, SDRS is required to solve the space utilization threshold violation and will migrate a workload from a faster datastore to a slower datastore, prioritizing space utilization over IO utilization. Although future invocations of the SDRS algorithm might solve the problem by moving VMDK’s around to find a more optimal balance, no priority or guarantees can be assigned to a specific virtual disk avoiding potential decrease in performance of a specific VMDK. Now at this point most of you wonder if VASA and storage profiles can be used in such a configuration to associate specific profiles to virtual machines and make these VMs compliant to specific datastores. SDRS does not incorporate storage profiles compliancy in the load balancing algorithms and unfortunately not every storage vendor offers VASA providers of their arrays. Some excellent articles about VASA and Profile driven storage can be found at [Yellow Bricks.com](http://Yellow Bricks.com) and blogs.vmware.com/vSphere/storage VASA: http://blogs.vmware.com/vsphere/2011/08/vsphere-50-storage-features-part-10-vasa-vsphere-storage-apis-storage-awareness.html Profile driven storage: http://www.yellow-bricks.com/2011/07/13/vsphere-5-0-profile-driven-storage-what-is-it-good-for/ To guarantee specific performance to virtual machines it is recommended to uses similar type disks to back the datastores of a datastore cluster. This configuration offers a stable and predictable service level to the virtual infrastructure. If multiple types of disks are available, it is recommended to split and create multiple datastore clusters each containing groups of identical types of disks. Previous articles in the SDRS short series Architecture and design of Datastore clusters: Part1: Architecture and design of datastore clusters Part2: Partially connected datastore clusters
CYBER MONDAY DEAL!
We are long time fascinated by the whole Black Friday and Cyber Monday craze in the USA. Unfortunately we do not celebrate Thanksgiving in the Netherlands and none of the shops are participating in something similar as Black Friday. This year we thought it was a great idea to participate in some form and what better than to offer our vSphere 5 Clustering Technical Deepdive e-book for a price you cannot resist. We just changed the price of the vSphere 5 Clustering Technical Deepdive to $ 4.99 and 3.99 for our European friends. Yes that is correct…. Less than 5 dollars for over 350 pages of deepdive material. What better way than recover from the madness of Black Friday and just sit back and relax reading this amazing piece of work? This is most definitely the deal of the year for all virtualization fanatics! Keep in mind that this is a limited offer, Tuesday the 29th the price will be back to “normal” again. US – ebook – $ 4.99 UK – ebook – £ 3.99 DE – ebook – € 3.99 FR – ebook – € 3.99 Pick it up, tell your friends / colleagues / family about it… Here are some snippets from Amazon reviews, but with 15 extremely positive reviews, all of them 5 out of 5, you know you can’t go wrong:
NEW JOB ROLE
The last two years I enjoyed working as an architect within the PSO organization of VMware, designing and reviewing the most interesting virtual infrastructures in Europe. However today I signed my new contract, accepting a position within the Technical Marketing team. Starting December I will focus on resource management and disaster avoidance technologies. My new role allows me to collaborate with the Product managers and the R&D organization on products such as DRS, Storage DRS, vMotion, Storage vMotion and FT. My main tasks will be developing best practices, white-papers, documentation and technical presentations, educating field organizations and of course the customers. Although I enjoyed working within the PSO organization, I can’t wait to get started. Thanks to all the people who made my move possible and offering me such an opportunity!
FDM IN MIXED ESX AND VSPHERE CLUSTERS
Last couple of weeks I’ve been receiving questions about vSphere HA FDM agent in a mixed cluster. When upgrading vCenter to 5.0, each HA cluster will be upgraded to the FDM agent. A new FDM agent will be pushed to each ESX server. The new HA version supports ESX(i) 3.5 through ESXi 5.0 hosts. Mixed clusters will be supported so not all hosts have to be upgraded immediately to take advantage of the new features of FDM. Although mixed environments are supported we do recommend keeping the time you run difference versions in a cluster to a minimum. The FDM agent will be pushed to each hosts, even if the cluster contains identically configured hosts, for example a cluster containing only vSphere 4.1 update 1 will still be upgraded to the new HA version. The only time vCenter will not push the new FDM agent to a host if the host in question is a 3.5 host without the required patch. When using clusters containing 3.5 hosts, it is recommended to upgrade the ESX host to ESX350-201012401-SG PATCH (ESX 3.5) or ESXe350-201012401-I-BG PATCH (ESXi) patch first before upgrading vCenter to vCenter 5.0. If you still get the following error message: Host ’’ is of type ( ) with build , it does not support vSphere HA clustering features and cannot be part of vSphere HA clusters. Visit the VMware knowledgebase article: 2001833.
PARTIALLY CONNECTED DATASTORE CLUSTERS
The first article in the series about architecture and design decisions series focuses on the connectivity of the datastores within the datastore cluster. Connectivity between ESXi hosts and datastores in the datastore cluster affects initial placement and load balancing decisions made by DRS and Storage DRS. Although connecting a datastore to all ESXi hosts inside a cluster is a common practice, we still come across partially connected datastores in virtual environments. What is the impact of a partially connected datastore, member of a datastore cluster, connected to a DRS cluster? What interoperability problems can you expect and what is the impact of this design on DRS load balancing operations and SDRS load balancing operations? Let’s start with the basic terminology.
ARCHITECTURE AND DESIGN OF DATASTORE CLUSTERS
Storage DRS extends the DRS feature set to the storage space. The primary element used by SDRS is a datastore cluster. Introducing the concept of datastore clusters can affect or shift the paradigm of storage management in virtual infrastructures. This article is the start of a short series of articles focusing on the design considerations of datastore clusters Datastore cluster concept Let’s start with looking at the concept datastore cluster. Datastore clusters can be regarded as the equivalent of DRS clusters. A datastore cluster is the storage equivalent of an vCenter (DRS) cluster whereas a datastore is the equivalent of a ESXi host. As datastore clusters pool storage resources into one single logical pool it becomes a management object. This storage pooling allows the administrator to manage many individual datastores as one element, and depending on the enabled SDRS features, providing optimized usage of storage capacity and IO performance capability off all member datastores. SDRS settings are configured at datastore cluster level and are applied to each member datastore inside the datastore cluster. When SDRS is enabled the datastore cluster it becomes the storage load-balancing domain, requiring administrators and architects to treat the datastore cluster as a single entity for decision making instead of individual datastores. Datastore Clusters architecture and design Although datastore clusters offers an abstraction layer, one must keep in mind the relationship between existing objects like hosts, clusters, virtual machine and virtual disks. This new abstraction layer might even disrupt existing (organizational) processes and policies. Introducing datastore clusters can have impact on various design decisions such as VMFS datastores sizing, configuration of the datastore clusters, the variety in datastore clusters and the number of datastore clusters in the virtual infrastructure. In this series I will address these considerations more in depth. Stay tuned for the first part; the impact of connectivity of datastores in a datastore cluster. More articles in the architecting and designing datastore clusters series: Part2: Partially connected datastore clusters. Part3: Impact of load balancing on datastore cluster configuration. Part4: Storage DRS and Multi-extents datastores. Part5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.
SDRS OUT OF SPACE AVOIDANCE
During VMworld I noticed a lot of focus of the attendees was on the IO load balancing features of Storage DRS (SDRS), however SDRS is more than only IO load balancing. Both space load balancing feature and initial placement are just as incredible, powerful and as useful as IO load balancing. Actually the term space load balancing isn’t really doing the algorithm any justice as it sounds it makes “unnecessary” moves around space usage, whereas “out of space avoidance” suits more the nature of this SDRS algorithm, because it will make crucial recommendations that in my opinion bring a lot of value. Initial placement and IO load balancing will be featured in future articles, but in this post I would like to focus on the out of space avoidance feature of SDRS. When SDRS is enabled, it will automatically make recommendations based on space utilization and IO load. IO load balancing can be activated or deactivated when enabling or disabling the option “Enable I/O metric for SDRS recommendations”. SDRS does not offer the option to enable or disable out of space avoidance; out of space avoidance is enabled by default and can only be disabled by disabling SDRS in its entirety. Thresholds By default, SDRS monitors the datastore space utilization and generates migration recommendations if the datastore utilization is exceeding the “space utilization ratio threshold”. The utilized space threshold determines the maximum acceptable space load of the VMFS datastore. And this is a part of the SDRS settings of the datastore cluster. This threshold is set by default to 80% and can be set to any value between 50 and 100 percent. [caption id=“attachment_1715” align=“aligncenter” width=“619” caption=“Space utilization ratio threshold”][/caption] Be aware that this threshold applies to each datastore that is a member of the datastore cluster; if you want to have similar absolute space headroom, it is recommended to add similar sized datastores to the datastore cluster. If the threshold is reached, SDRS will not migrate a random virtual machine to a random datastore, it needs to adhere to certain rules. Besides running a cost-benefit analysis on the registered virtual machines in the datastore, it also takes the “space utilization ratio difference threshold” into account. [caption id=“attachment_1718” align=“aligncenter” width=“619” caption=“Space utilization ratio difference threshold”][/caption] The utilization difference setting allows SDRS to determine which datastores should be considered as destination for virtual machines migrations. The space utilization ratio difference threshold indicates the required difference of utilization ratio between the destination and source datastores. The difference threshold is an advanced option of the SDRS runtime rules and is set to a default value of 5%. Consequently SDRS will not move any virtual machine disk from an 83% utilized datastore to a 78% utilized datastore. The reason why SDRS uses this setting is to avoid recommending migrations of marginal value. SDRS also uses space growth rate to avoid risky migrations. A migration is considered risky if it has to be undone in the near future. SDRS defines near future as a time window that is longer than the lead-time of a storage vMotion and defaults to 30 hours. This option cannot be changed in any supported way. Hence, SDRS will avoid moving any virtual machine disk to a datastore that is expected, based on growth rate, to exceed the utilization threshold within the next 30 hours. Space Utilization How does SDRS determine if VMFS datastore has exceeded the threshold? It does this by comparing the “Space utilization” against the utilized space threshold. Space utilization is determined by dividing the total consumed space on the datastore by the datastore capacity.
MEM.MINFREEPCT SLIDING SCALE FUNCTION
One of the cool “under the hood” improvements vSphere 5 offers is the sliding scale function of the Mem.MinFreePct. Before diving into the sliding scale function, let’s take a look at the Mem.MinFreePct function itself. MinFreePct determines the amount of memory the VMkernel should keep free. This threshold is subdivided in various memory thresholds, i.e. High, Soft, Hard and Low and is introduced to prevent performance and correctness issues. The threshold for the low state is required for correctness. In other words, it protects the VMkernel layer from PSOD’s resulting from memory starvation. The soft and hard thresholds are about virtual machine performance and memory starvation prevention. The VMkernel will trigger more drastic memory reclamation techniques when it approaches the Low state. If the amount of free memory is just a bit less than the Min.FreePct threshold, the VMkernel applies ballooning to reclaim memory. The ballooning memory reclamation technique introduces the least amount of performance impact on the virtual machine by working together with the Guest operating system inside the virtual machine, however there is some latency involved with ballooning. Memory compressing helps to avoid hitting the low state without impacting virtual machine performance, but if memory demand is higher than the VMkernels’ ability to reclaim, drastic measures are taken to avoid memory exhaustion and that is swapping. However swapping will introduce VM performance degradations and for this reason this reclamation technique is used when desperate moments require drastic measurements. For more information about reclamation techniques I recommend reading the “disable ballooning” article. vSphere 4.1 allowed the user to change the default MinFreePct value of 6% to a different value and introduced a dynamic threshold of the Soft, Hard and Low state to set appropriate thresholds and prevent virtual machine performance issues while protecting VMkernel correctness. By default vSphere 4.1 thresholds was set to the following values:
UPGRADING VMFS DATASTORES AND SDRS
Among many new cool features introduced by vSphere 5 is the new VMFS file system for block storage. Although vSphere 5 can use VMFS-3, VMFS-5 is the native VMFS level of vSphere 5 and it is recommended to migrate to the new VMFS level as soon as possible. Jason Boche wrote about the difference between VMFS-3 and VMFS-5. vSphere 5 offers a pain free upgrade path from VMFS-3 to VMFS-5. The upgrade is an online and non-disruptive operation which allows the resident virtual machines to continue to run on the datastore. But upgraded VMFS datastores may have impact on SDRS operations, specifically virtual machine migrations. When upgrading a VMFS datastore from VMFS-3 to VMFS-5, the current VMFS-3 block size will be maintained and this block size may be larger than the VMFS-5 block size as VMFS-5 uses unified 1MB block size. For more information about the difference between native VMFS-5 datatstores and upgraded VMFS-5 datastore please read: Cormac’s article about the new storage features Although the upgraded VMFS file system leaves the block size unmodified, it removes the maximum file size related to a specific block size, so why exactly would you care about having a non-unified block size in your SDRS datastore cluster? In essence, mixing different block sizes in a datastore cluster may lead to a loss in efficiency and an increase in the lead time of a storage vMotion process. As you may remember, Duncan wrote an excellent post about the impact of different block sizes and the selection of datamovers. To make an excerpt, vSphere 5 offers three datamovers: • fsdm • fs3dm • fs3dm – hardware offload The following diagram depicts the datamover placement in the stack. Basically, the longer path the IO has to travel to be handled by a datamover, the slower the process. In the most optimal scenario, you want to leverage the VAAI capabilities of your storage array. vSphere 5 is able to leverage the capabilities of the array allowing hardware offload of the IO copy. Most IOs will remain within the storage controller and do not travel up the fabric to the ESXi host. But unfortunately not every array is VAAI capable. If the attached array is not VAAI capable or enabled, vSphere will leverage the FS3DM datamover. FS3DM was introduced in vSphere 4.1 and contained some substantial optimizations so that data does not travel through all stacks. However if a different block size is used, ESXi reverts to FSDM, commonly known as the legacy datamover. To illustrate the difference in Storage vMotion lead time, read the following article (once again) by Duncan: Storage vMotion performance difference. This article contains the result of a test in which a virtual machine was migrated between two different types of disks configured with deviating block sizes and at a different stage a similar block size. To emphasize; the results illustrates the lead time of the FS3DM datamover and the FSDM datamover. The results below are copied from the Yellow-Bricks.com article: