NEW FLING: DRS ENTITLEMENT

I’m proud to announce the latest fling; DRS entitlement. This fling is built by the performance team and it provides insight to the demand and entitlement of the virtual machines and resource pools within a vSphere cluster. By default, it shows the active CPU and memory consumption, which by itself helps to understand the dynamics within the cluster. Especially when you are using resource pools with different levels of share values. In this example, I have two resource pools, one containing the high-value workloads for the organization, and one resource pool containing virtual machines that are used for test and dev operations. The high-value workloads should receive the resources they require all the time. The What-If functionality allows you to simulate a few different scenarios. A 100% demand option and a simulation of resource allocation settings. The screenshot below shows the what-if entitlement. What if these workloads generate 100% of activity, what resources do these workloads require if they go to the max? This allows you to set the appropriate resource allocations settings such as reservations and limits on the resource pools or maybe even on particular virtual machines. Another option is to specify particular Reservation, Limits, and Shares (RLS) settings to an object. Select the RLS option and select the object you want to use in the simulation. In this example, I selected the Low Value Workload resource pool and changed the share value setting of the resource pool. You can verify the new setting before running the analysis. Please note, that this is an analysis, it does not affect the resource allocation of active workload whatsoever. You can simulate different settings and understand the outcome. Once the correct setting is determined you can apply the setting on the object manually, or you can use the PowerCLI setting and export the PowerCLI one-liner to programmatically change the RLS settings. Follow the instruction on the flings website to install it on your vCenter. I would like to thank Sai Inabattini and Adarsh Jagadeeshwaran for creating this fling and for listening to my input! RUN DRS!

STRETCHED CLUSTERS ON VMWARE CLOUD ON AWS, A REALLY BIG THING

This week Emad published an excellent article about the stretched cluster functionality of VMware Cloud on AWS. To sum up, you can now deploy a single vSphere cluster across two AWS availability zones. A trip to Memory Lane I think the ability to stretch a vSphere cluster across two availability zones is a really big thing. Go back to the days where we had to refactor the application to make it highly available. To reduce application downtime, you typically used clustering software such as Microsoft cluster or Veritas clustering services. But not all applications were fit for this solution. When we introduced VMware High Availability back in 2006, we brought a big change to the industry. From that point on you could provide crash-consistent failover ability to all your workloads. No need to refactor any application, no need to build outlandish hardware solutions. Just enable a few tickboxes at the infrastructure layer, and every workload running inside a VM is protected. And to this day, HA remains the most popular functionality of vSphere. Amazon Web Services Resiliency Strategy Amazon urges you to design your application to be resilient to infrastructure outages. Amazon AWS is hosted in multiple locations worldwide. These locations are composed of regions and Availability Zones. Each region is a separate geographic area that has multiple, isolated locations known as Availability Zones. AWS provides the ability to place instances and data in multiple locations. And you can take advantage of the safety and reliability of geographic redundancy by spanning your Auto Scaling group across multiple Availability Zones within a region and then attach a load balancer to distribute incoming traffic across those Availability Zones. Incoming traffic is distributed equally across all Availability Zones enabled for your load balancer. And this works very well if you are refactoring your application or if you are building a complete new cloud-native stack. The challenge we face today is that not all applications lend to getting refactored, or some applications do not require the journey from monolithic to full-FAAS. Hybrid-Cloud Experience With stretched clusters in VMware Cloud on AWS, we introduce the same ease of infrastructure resiliency to workloads that run on AWS infrastructure. Merely expand you vSphere cluster to 6 hosts and select multi-az deployment. After that, the workload in the Cloud SDDC is protected for AZ outages. If something happens, HA detects the failed VMs and restarts them on different physical servers in the remaining AZ without manual human involvement. The ability to stretch your vSphere cluster across AZs allows you to easily provide resiliency to your workload within the AWS infrastructure without the Herculean effort of refactoring all your applications.

DYING HOME LAB - FEEDBACK WELCOME

The servers in my home lab are dying on a daily basis. After four years of active duty, I think they have the right to retire. So I need something else. But what? I can’t rent lab space as I work with unreleased ESXi code. I’ve been waiting for the Intel Xeon D 21xx Supermicro systems, but I have the feeling that Elon will reach Mars before we see these systems widely available. The system that I have in mind is the following:

DEDICATED HARDWARE IN A PUBLIC CLOUD WORLD

One of the more persistent misconceptions is that the components of VMware’s Software Defined Data Center (SDDC) on VMware Cloud on AWS are virtualized or that the deployed VMs run natively on Amazon. And to be honest, it’s not even weird that most people think this way. After all, Amazon Web Services launched in March 2006, 12 years ago. AWS and Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) are synonymous with each other. All of a sudden, you can know “run vSphere on AWS”.

VBROWNBAG TECHTALKS VMWORLD CALL FOR PAPERS NOW OPEN

Although the selection process of the submitted VMworld 2018 sessions is still ongoing, vBrownbag announced their call for papers. As Duncan mentioned in his Call for paper article ‘Good luck, and remember: if you don’t end up getting selected, submit the proposal to a VMUG near you instead. They are always begging for community sessions.’ Think about signing up for the vBrownbag as well. Since last year all the vBrownbag sessions are published in the content catalog. Thus your session is visible for all 23.000+ attendees. Go right ahead and fill out this form.

THE PUBLIC SHAMING OF RESOURCE POOL-AS-A-FOLDER USER

Yesterday there was some public shaming done of Antony Spiteri. He was outed that he was using vSphere resource pool as folders. https://twitter.com/davidhill_co/status/988797652346245126 A funny thread and he truly deserved all the public shaming by the community members ;). All fun aside, using resource pools as folders are not recommended by VMware. As I described in the new vSphere 6.5 DRS white paper available at vSphere central: Correct use: Resource pools are an excellent construct to isolate a particular amount of resources for a group of virtual machines without having to micro-manage resource setting for each individual virtual machine. A reservation set at the resource pool level guarantees each virtual machine inside the resource pool access to these resources. Depending on the activity of these virtual machines these virtual machines can operate without any contention. Incorrect use: Resource pools should not be used as a form of folders within the inventory view of the cluster. Resource pools consume resources from the cluster and distribute these amongst its child objects within the resource pool; this can be additional resource pools and virtual machines. Due to the isolation of resources, using resource pools as folders in a heavily utilized vSphere cluster can lead to an unintended level of performance degradation for some virtual machines inside or outside the resource pool. Understanding this behavior allows you to design a correct resource pool structure. Currently, I’m working on a new vSphere DRS Resource Pool white paper which sheds some new light on the distribution of resources under normal conditions and under load (the Resource Pool Pie Paradox). I will keep you posted!

PUBLIC SPEAKING SCHEDULE

The VMUG season has started, and I have a few speaking sessions at various events. I thought it might be convenient to list the events and topics: Date: February, 22 Organization: North East UK VMUG Location: Newcastle Topic: VMware Cloud on AWS from a resource management perspective Date: March, 7 Organization: Swiss-French VMUG Location: Lausanne Switzerland Topic: VMware Cloud on AWS from a resource management perspective Date: March, 8 Organization: Swiss-German VMUG Location: Zurich Switzerland Topic: VMware Cloud on AWS from a resource management perspective Date: March, 20 Organization: Dutch VMUG Location: Den Bosch Netherlands Topic: vSphere Resource Kit Double-Hour Session 1: vSphere 6.5 Host Resource Deep Dive with Niels Hagoort Session 2: vSphere 6.5 Clustering Deep Dive with Duncan Epping Date: March 29, Organization: Virtual VMUG Location: Online Topic: VMware Cloud on AWS from a resource management perspective Date: April 10, Organization: Turkey VMUG Location: Istanbul, Turkey Topic: VMware Cloud on AWS from a resource management perspective Date: May 24 Organization: Czech Republic VMUG Location: Prague Topic: vSphere 6.5 Host Resource Deep Dive with Niels Hagoort Hope to see you there

VIRTUALLY SPEAKING PODCAST #67 RESOURCE MANAGEMENT

Two weeks ago Pete Flecha (a.k.a. Pedro Arrow) and John Nicholson invited me to their always awesome podcast to talk about resource management. During our conversation, we covered both on-prem and the features of VMware Cloud on AWS that help cater the needs of your workload. Being a guest on this podcast is an honour and times flies talking to these two guys. Hope you enjoy it as much as I did.

VSPHERE 6.5 DRS AND MEMORY BALANCING IN NON-OVERCOMMITTED CLUSTERS

DRS is over a decade old and is still going strong. DRS is aligned with the premise of virtualization, resource sharing and overcommitment of resources. DRS goal is to provide compute resources to the active workload to improve workload consolidation on a minimal compute footprint. However, virtualization surpassed the original principle of workload consolidation to provide unprecedented workload mobility and availability. With this change of focus, many customers do not overcommit on memory. A lot of customers design their clusters to contain (just) enough memory capacity to ensure all running virtual machines have their memory backed by physical memory. In this scenario, DRS behavior should be adjusted as it traditionally focusses on active memory use. vSphere 6.5 provides this option in the DRS cluster settings. By ticking the box “Memory Metric for Load Balancing” DRS uses the VM consumed memory for load-balancing operations. Please note that DRS is focussed on consumed memory, not configured memory! DRS always keeps a close eye on what is happening rather than accepting static configuration. Let’s take a closer look at DRS input metrics of active and consumed memory. Out-of-the-box DRS Behavior During load balancing operation, DRS calculates the active memory demand of the virtual machines in the cluster. The active memory represents the working set of the virtual machine, which signifies the number of active pages in RAM. By using the working-set estimation, the memory scheduler determines which of the allocated memory pages are actively used by the virtual machine and which allocated pages are idle. To accommodate a sudden rapid increase of the working set, 25% of idle consumed memory is allowed. Memory demand also includes the virtual machine’s memory overhead. Let’s use a 16 GB virtual machine as an example of how DRS calculates the memory demand. The guest OS running in this virtual machine has touched 75% of its memory size since it was booted, but only 35% of its memory size is active. This means that the virtual machine has consumed 12288 MB and 5734 MB of this is used as active memory. As mentioned, DRS accommodate a percentage of the idle consumed memory to be ready for a sudden increase in memory use. To calculate the idle consumed memory, the active memory 5734 MB is subtracted from the consumed memory, 12288 MB, resulting in a total 6554 MB idle consumed memory. By default, DRS includes 25% of the idle consumed memory, i.e. 6554 * 25% = +/- 1639 MB. The virtual machine has a memory overhead of 90 MB. The memory demand DRS uses in its load balancing calculation is as follows: 5734 MB + 1639 MB + 90 MB = 7463 MB. As a result, DRS selects a host that has 7463 MB available for this machine if it needs to move this virtual machine to improve the load balance of the cluster. Memory Metric for Load Balancing Enabled When enabling the option “Memory Metric for Load Balancing” DRS takes into account the consumed memory + the memory overhead for load balancing operations. In essence, DRS uses the metric Active + 100% IdleConsumedMemory. vSphere 6.5 update 1d UI client allows you to get better visibility in the memory usage of the virtual machines in the cluster. The memory utilization view can be toggled between active memory and consumed memory. Recently, Adam Eckerle on Twitter published a great article that outlines all the improves of vSphere 6.5 Update 1d. Go check it out. Animated Gif courtesy of Adam. When reviewing the cluster it shows that the cluster is pretty much balanced. When looking at the default view of the sum of Virtual Machine memory utilization (active memory). It shows that ESXi host ESXi02 is busier than the others. However since the active memory of each host is less than 20% and each virtual machine is receiving the memory they are entitled to, DRS will not move virtual machines around. Remember, DRS is designed to create as little overhead as possible. Moving one virtual machine to another host to make the active usage more balanced, is just a waste of compute cycles and network bandwidth. The virtual machines receive what they want to receive now, so why take the risk of moving VMs? But a different view of the current situation is when you toggle the graph to use consumed memory. Now we see a bigger difference in consumed memory utilization. Much more than 20% between ESXi02 and the other two hosts. By default DRS in vSphere 6.5 tries to clear a utilization difference of 20% between hosts. This is called Pair-Wise Balancing. However, since DRS is focused on Active memory usage, Pair-Wise Balancing won’t be activated with regards to the 20% difference in consumed memory utilization. After enabling the option “Memory Metric for Load Balancing” DRS rebalances the cluster with the optimal number of migrations (as few as possible) to reduce overhead and risk. Active versus Consumed Memory Bias If you design your cluster with no memory overcommitment as guiding principle, I recommend to test out the vSphere 6.5 DRS option “Memory Metric for Load Balancing”. You might want to switch DRS to manual mode, to verify the recommendations first.

EXPLAINER ON #SPECTRE & #MELTDOWN BY GRAHAM SUTHERLAND

Sometimes you stumble across a brilliant Twitter thread, so good, that it should never be lost. Graham Sutherland (@gsuberland) helped the world in understanding the Spectre and Meltdown bugs. I’m publishing his tweet thread in text form as this is just the best explanation of the bugs I’ve seen. Please note that VMware has released its response for Bounds-Check Bypass (CVE-2017-5753), Branch Target Injection (CVE-2017-5715) & Rogue Data Cache Load (CVE-2017-5754) - AKA Meltdown & Spectre.