FUNNY: HA AND DRS TECHNICAL DEEPDIVE AUDIOBOOK
During a conversation the idea of an audiobook of the HA and DRS book spawned. Within a couple of minutes, I found the following in my inbox…. Once a tiny little vm found himself in a big bad cluster filled with big vm’s…… Rapunzel, Rapunzel, set down your high shares! Then the admin installed the LittleBoyBlue patch on the Dike server, and plugged the memory leak. Odysseus set a CPU limit on the Cyclops-VM so low that Cyclops couldn’t even see. “Who are you?” yelled the Cyclops. Odysseus replied, “My name is No One!” When the Cyclops complained to the Scheduler, it asked, “Who has limited you so badly?” “No One has!” replied the Cyclops…. (BTW who makes a creature with one eye? What a horrible single point of failure to bake into your design!) But the third little VM had his very own Resource Pool, and he huffed and he puffed and he outcompeted the much bigger VMs who were all sharing their Resource Pool shares… Just let’s focus on publishing an ebook first…..
IMPACT OF OVERSIZED VIRTUAL MACHINES PART 3
In part 1 of the series of post on the impact of oversized virtual machines NUMA architecture, memory overhead reservation and share levels are reviewed, part 2 zooms in on the impact of memory overhead reservation and share levels on HA and DRS. This part looks at CPU scheduling, memory management and what impact oversized virtual machines have on the environment when a bootstorm occurs. Multiprocessor virtual machine In most cases, adding more CPUs to a virtual machine does not automatically guarantee increase throughput of the application, because some workloads cannot always take advantage of all the available CPUs. Sharing resources and scheduling these processes will introduce additional overhead. For example, a four-way virtual machine is not four times as productive as a single-CPU system. If the application is unable to scale than the application will not benefit from these additional available resource.
NODE INTERLEAVING: ENABLE OR DISABLE?
There seems to be a lot of confusion about this BIOS setting, I receive lots of questions on whether to enable or disable Node interleaving. I guess the term “enable” make people think it some sort of performance enhancement. Unfortunately the opposite is true and it is strongly recommended to keep the default setting and leave Node Interleaving disabled. Node interleaving option only on NUMA architectures The node interleaving option exists on servers with a non-uniform memory access (NUMA) system architecture. The Intel Nehalem and AMD Opteron are both NUMA architectures. In a NUMA architecture multiple nodes exists. Each node contains a CPU and memory and is connected via a NUMA interconnect. A pCPU will use its onboard memory controller to access its own “local” memory and connects to the remaining “remote” memory via an interconnect. As a result of the different locations memory can exists, this system experiences “non-uniform” memory access time.
IMPACT OF OVERSIZED VIRTUAL MACHINES PART 2
In part 1 of the series of post on the impact of oversized virtual machines NUMA architecture, memory overhead reservation and share levels are reviewed, part 2 zooms in of the impact of memory overhead reservation and share levels on HA and DRS. Impact of memory overhead reservation on HA Slot size The VMware High Availability admission control policy “Host failures cluster tolerates” calculates a slot size to determine the maximum amount of virtual machines active in the cluster without violating failover capacity. This admission control policy determines the HA cluster slot size by calculating the largest CPU reservation, largest memory reservation plus it’s memory overhead reservation. If the virtual machine with the largest reservation (which could be an appropriate sized reservation) is oversized, its memory overhead reservation still can substantial impact the slot size. The HA admission control policy “Percentage of Cluster Resources Reserved” calculate the memory component of its mechanism by summing the reservation plus the memory overhead of each virtual machine. Therefore allowing the memory overhead reservation to even have a bigger impact on admission control than the calculation done by the “Host Failures cluster tolerates” policy. DRS initial placement DRS will use a worst-case scenario during initial placement. Because DRS cannot determine resource demand of the virtual machine that is not running, DRS assumes that both the memory demand and CPU demand is equal to its configured size. By oversizing virtual machines it will decrease the options in finding a suitable host for the virtual machine. If DRS cannot guarantee the full 100% of the resources provisioned for this virtual machine can be used it will vMotion virtual machines away so that it can power on this single virtual machine. In case there are not enough resources available DRS will not allow the virtual machine to be powered on. Shares and resource pools When placing a virtual machine inside a resource pool, its shares will be relative to the other virtual machines (and resource pools) inside the pool. Shares are relative to all the other components sharing the same parent; easier way to put it is to call it sibling share level. Therefore the numeric share values are not directly comparable across pools because they are children of different parents. By default a resource pool is configured with the same share amount equal to a 4 vCPU, 16GB virtual machine. As mentioned in part 1, shares are relative to the configured size of the virtual machine. Implicitly stating that size equals priority. Now lets take a look again at the image above. The 3 virtual machines are reparented to the cluster root, next to resource pools 1 and 2. Suppose they are all 4 vCPU 16GB machines, their share values are interpreted in the context of the root pool and they will receive the same priority as resource pool 1 and resource pool2. This is not only wrong, but also dangerous in a denial-of-service sense – a virtual machine running on the same level as resource pools can suddenly find itself entitled to nearly all cluster resources. Because of default share distribution process we always recommend to avoid placing virtual machines on the same level of resource pools. Unfortunately it might happen that a virtual machine is reparented to cluster root level when manually migrating a virtual machine using the GUI. The current workflow defaults to cluster root level instead of using its current resource pool. Because of this it’s recommended to increase the number of shares of the resource pool to reflect its priority level. More info about shares on resource pools can be found in Duncan’s post on yellow-bricks.com. Go to Part 3: Impact of oversized virtual machine.
IMPACT OF OVERSIZED VIRTUAL MACHINES PART 1
Recently we had an internal discussion about the overhead an oversized virtual machine generates on the virtual infrastructure. An oversized virtual machine is a virtual machine that consistently uses less capacity than its configured capacity. Many organizations follow vendor recommendations and/or provision virtual machine sized according to the wishes of the customer i.e. more resources equals better performance. By oversizing the virtual machine you can introduce the following overhead or even worse decrease the performance of the virtual machine or other virtual machines inside the cluster. Note: This article does not focus on large virtual machines that are correctly configured for their workloads. Memory overhead Every virtual machine running on an ESX host consumes some memory overhead additional to the current usage of its configured memory. This extra space is needed by ESX for the internal VMkernel data structures like virtual machine frame buffer and mapping table for memory translation, i.e. mapping physical virtual machine memory to machine memory. The VMkernel will calculate a static overhead of the virtual machine based on the amount of vCPUs and the amount of configured memory. Static overhead is the minimum overhead that is required for the virtual machine startup. DRS and the VMkernel uses this metric for Admission Control and vMotion calculations. If the ESX host is unable to provide the unreserved resources for the memory overhead, the VM will not be powered on, in case of vMotion, if the destination ESX host must be able to back the virtual machine reservation and the static overhead otherwise the vMotion will fail. The following table displays a list of common static memory overhead encountered in vSphere 4.1. For example, a 4vCPU, 8GB virtual machine will be assigned a memory overhead reservation of 413.91 MB regardless if it will use its configured resources or not.
ENHANCED VMOTION COMPATIBILITY
Enhanced vMotion Compatibility (EVC) is available for a while now, but it seems to be slowly adopted. Recently VMguru.nl featured an article “Challenge: vCenter, EVC and dvSwitches” which illustrates another case where the customer did not enable EVC when creating the cluster. There seem to be a lot of misunderstanding about EVC and the impact it has on the cluster when enabled. What is EVC? VMware Enhanced VMotion Compatibility (EVC) facilitates VMotion between different CPU generations through use of Intel Flex Migration and AMD-V Extended Migration technologies. When enabled for a cluster, EVC ensures that all CPUs within the cluster are VMotion compatible. What is the benefit of EVC? Because EVC allows you to migrate virtual machines between different generations of CPUs, with EVC you can mix older and newer server generations in the same cluster and be able to migrate virtual machines with VMotion between these hosts. This makes adding new hardware into your existing infrastructure easier and helps extend the value of your existing hosts.
EUROPEAN DISTRIBUTOR FOR HA AND DRS BOOK
As of today, our book “vSphere 4.1 HA and DRS Technical Deepdive” can be ordered via ComputerCollectief. Computercollectief is a dutch computer book and software reseller and ships to most European countries. Using Computercollectief, we hope to evade the long shipping times and accompanying costs. Go check it out. http://www.comcol.nl/detail/73133.htm Comcol expect to be able to deliver at the end of this month.
DUTCH VBEERS
Simon Long of The SLOG is introducing vBeers to Holland. I’ve copied the text from his vBeers blog article. Every month Simon Seagrave and I try organise a social get together of like-minded Virtualization enthusiasts held in a pub in central London (and Amsterdam). We like to call it vBeers. Before I go on, I would just like to state, although it’s called vBeers, you do NOT have to drink beer or any other alcohol for that matter. This isn’t just an excuse to get blind drunk. We came up with idea whilst on the Gestalt IT Tech Field Day back in April. We were chatting and we both recognised that we don’t get together enough to catch-up, mostly do to busy work schedules and private lives. We felt that if we had a set date each month, the likely hood of us actually making that date would be higher than previous attempts. So the idea of vBeers was born.
HA AND DRS TECHNICAL DEEPDIVE AVAILABLE
After spending almost a year on writing, drawing and editing, the moment Duncan and I waited for finally arrived… Our new book, the vSphere 4.1 HA and DRS technical deepdive is available on CreateSpace and Amazon.com. Early this year Duncan approached me and asked me if I was interested in writing a book together on HA and DRS, without hesitation I accepted the honor. Before discussing the contents of the book I would like take the opportunity to thank our technical reviewers for their time, their wisdom and their input: Anne Holler (VMware DRS Engineering), Craig Risinger (VMware PSO), Marc Sevigny (VMware HA Engineering) and Bouke Groenescheij (Jume.nl). And a very special thanks to Scott Herold for writing the foreword! But most of all I would like to thank Duncan for giving me this opportunity to work together with him on creating this book. The in-depth discussions we had are without a doubt the most difficult I have ever experienced and were very interesting, both most of all fun! Thanks! Now let’s take a look at the book.Please note that we are still working on an electronic version of the book and we expect to finish this early 2011. This is the description of the book that is up on CreateSpace: About the authors: Duncan Epping (VCDX 007) is a Consulting Architect working for VMware as part of the Cloud Practice. Duncan works primarily with Service Providers and large Enterprise customers. He is focused on designing Public Cloud Infrastructures and specializes in bc-dr, vCloud Director and VMware HA. Duncan is the owner of Yellow-Bricks.com, the leading VMware blog. Frank Denneman (VCDX 029) is a Consulting Architect working for VMware as part of the Professional Services Organization. Frank works primarily with large Enterprise customers and Service Providers. He specializes in Resource Management, DRS and storage. Frank is the owner of frankdenneman.nl which has recently been voted number 6 worldwide on vsphere-land.com VMware vSphere 4.1 HA and DRS Technical Deepdive zooms in on two key components of every VMware based infrastructure and is by no means a “how to” guide. It covers the basic steps needed to create a VMware HA and DRS cluster, but even more important explains the concepts and mechanisms behind HA and DRS which will enable you to make well educated decisions. This book will take you in to the trenches of HA and DRS and will give you the tools to understand and implement e.g. HA admission control policies, DRS resource pools and host affinity rules. On top of that each section contains basic design principles that can be used for designing, implementing or improving VMware infrastructures. Coverage includes: • HA node types • HA isolation detection and response • HA admission control • VM Monitoring • HA and DRS integration • DRS imbalance algorithm • Resource Pools • Impact of reservations and limits • CPU Resource Scheduling • Memory Scheduler • DPM We hope you will enjoy reading it as much as we did writing it. Thanks, Eric Sloof received a proof copy of the book and shot a video about it.
SHOULD OR MUST VM-HOST AFFINITY RULES?
VMware vSphere 4.1 introduces a new affinity rule, called “Virtual Machines to Hosts” (VM-Host), which I described in the article “VM to Host affinity rule”. A short recap: VM-Host affinity rules are available in two flavors: Must run rules (Mandatory) Should run rules (Preferential) By providing these two options a new problem arises for the administrator\architect, when will the need occur for using the mandatory rule and when is it desired to use preferential rules? I think it all depends on the risk and limitations introduced by each rule. Let’s review difference between the rules, the behavior of each rule and the impact they have on cluster services and maintenance mode. What is the difference between a mandatory and a preferential rule?