• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

Kubernetes and vSphere Compute and Storage Doubleheader VMworld Session

July 19, 2018 by frankdenneman

Kubernetes is hot! It is one of the most talked about technologies of this year. For some, it’s the next platform, for others it’s just another tech making its way in the datacenter. Will it replace virtual machines, will it get displace vSphere? Some ask, why run Kubernetes on top of vSphere when you can run it on bare metal? We rather not go back to 2005 and deal with a sprawl of bare-metal servers, we believe Kubernetes and vSphere are better together!
In the session “CNA1553BU – Deep Dive: The value of Running Kubernetes on vSphere” Michael Gasch and I review the behavior of Kubernetes resource management, optimization, and availability for container orchestration. Kubernetes is a system optimized for cloud-native workloads where failure and disruption is anticipated, but how about the infrastructure that is required to run these cloud-native apps? How about Kubernetes ability to economically and optimally consume the available resources?

We will answer these questions and reveal why vSphere is such a good match with its extensive features such as high availability, NUMA optimization, and distributed resource scheduler. In this session, we explore the critical elements of a container and demonstrate that Kubernetes does not run in thin air. Running Linux on bare-metal or inside a VM determines your scalability, your recoverability, and your portability. If you spin up a Kubernetes cluster at Amazon or Google, they will deploy it for you in virtual machines, if these cloud-native giants use VMs, why would you use bare-metal?

Adding vSphere to the picture, Kubernetes gains several advantages for both, cloud-native and traditional workloads. vSphere also plays a critical role in keeping the Kubernetes control plane components highly-available in case of planned and unplanned downtime. We are going to detail recommended DRS and HA settings and many other best practices for Kubernetes on vSphere based on real-world customer scenarios. Of course, an outlook on upcoming improvements for the Kubernetes on vSphere integration should not be missing in a deep dive session! Last but not least, you’ll definitely learn how to respond to common objections to win back your end-user.
Still not convinced? Let’s dive into the behavior of Linux CPU scheduling versus ESXi CPU and NUMA scheduling and help you understand how to size and deploy your Kubernetes cluster on vSphere correctly.

Developers shouldn’t need to worry about all these settings and the underlying layers. They just want to deploy the application, but it’s our job to cater to the needs of the application and make sure the application runs consistently and constantly. This applies to compute, but also to storage.7 out of 10 applications that run on kubernetes are stateful, so it makes sense to incorporate persistent storage in your kubernetes design.

Some applications are able to provide certain services like replication themselves, thus it makes no sense to “replicate” that service at the infrastructure layer. vSAN and its storage policies allow the admin to provide storage services that are tailor-made to the application stack. Cormac Hogan and Christos Karamanolis talk about why vSAN is the ultimate choice for running next-gen apps. Visit their session “HCI1338BU-HCI: The Ideal Operational Environment for Cloud-Native Applications” to hear about real-world use-cases and learn what you need to do when dealing with these next-gen apps.
Please note that if you attempt adding these sessions to your schedule, you might get a warning that you are on the waiting list. As we understood it, all sessions are booked in small rooms and depending on the waiting list they are moved to bigger rooms. Thus sign up for these sessions even if they state waiting list only. It will be sorted out during the upcoming weeks.

Hope to see you in our session!

Filed Under: VMware

Introduction to Elastic DRS

July 17, 2018 by frankdenneman

VMware Cloud on AWS allows you to deploy physical ESXi hosts on demand. You can scale in and scale out your cluster by logging into the console.

This elasticity allows you to right-size your SDDC environment for the current workload demand. No more long procurement process, no more waiting for the vendor to ship the goods. No more racking, stacking in a cold dark datacenter. Just with a few clicks, you get new physical resources added to your cluster, ESXi and vSAN fully installed, configured, patched and ready to go!
Having physical resources available on demand is fantastic, but it still requires manual monitoring and manual operations to scale out or scale in the vSphere cluster. Wouldn’t it be more comfortable if the cluster automatically responds to the dynamic nature of the workloads? As of today, you can enable Elastic DRS.
Introducing Elastic DRS
Elastic Distributed Resources Scheduler (EDRS) is a policy-based solution that automatically scales a vSphere Cluster in VMware Cloud on AWS based on utilization. EDRS monitors CPU, memory, and storage resources for scaling operations. EDRS monitors the vSphere cluster continuously, and each 5 minutes EDRS runs the algorithm to determine if scale-out or scale-in operations is necessary.

Algorithm Behavior
EDRS is configured with thresholds for each resource and generates scaling recommendations if utilization consistently remains above or below their respective thresholds. EDRS algorithm takes spikes and randomness of utilization into consideration when generating these scaling recommendations.
Scaling Operations
Thresholds are defined for scale up operations and scale down operations. To avoid generating recommendations by spikes, EDRS generates a scale operation if the resource utilization shows consistent progress towards a threshold. To generate a scale out operation, a single threshold must be exceeded. That means that if CPU utilization shows consistent progress towards the threshold and at one point exceeds the threshold, EDRS triggers an event and adds an ESXi host to the vSphere cluster.

Similar to adding an ESXi host manually, the ESXi host is installed with the same ESXi version, patch level and is configured with the appropriate logical networks and adds the capacity to the vSAN datastore.

To automatically scale down the cluster, utilization across ALL three resources must be consistently below the specified scale-in thresholds.
Minimum and Maximum Number of ESXi hosts
You can restrict the bounds of a minimum and a maximum number of ESXi hosts. EDRS can be enabled if the cluster consists of four ESXi hosts, EDRS does not scale in beyond the four ESXi host minimum. When setting a maximum number of ESXi hosts, all ESXi hosts in the vSphere cluster, including those in maintenance mode are included in the count. Only active ESXi hosts are counted towards the minimum. As a result, the VMware cloud on AWS SDDC ignores EDRS recommendations during maintenance and hardware remediation operations. Currently, the maximum number of host in an Elastic-DRS enabled cluster is 16.
Scaling Policies
EDRS provides policies to adjust the behavior of scaling operations. EDRS provides two scaling policy that optimizes for cost or performance. Both policies have the same scale-out threshold. They only differ on scale-in thresholds.

Scale Out Threshold Performance Optimized Cost Optimized
CPU 90% 90%
Memory 80% 80%
Storage 70% 70%

As a result, if the cluster consistently utilizes memory over 80%, EDRS triggers a scale out operation that adds a new host to the vSphere cluster. Please note that the load is tracked at the host level and then aggregated. EDRS aggregates the CPU and memory load per fault-domain, for storage it is aggregated at the vSAN datastore level.

Scale In Threshold Performance Optimized Cost Optimized
CPU 50% 60%
Memory 50% 60%
Storage 20% 20%

In essence, the performance policy is more eager to keep the resources than the cost-optimized policy. If you set the EDRS cluster to cost-optimized, an ESXi host is removed from the vSphere cluster if CPU and memory utilization is consistently below 60% and storage utilization is consistently below 20%.
How to Configure Elastic DRS
Log into your VMware Cloud on AWS console and select the cluster. A message box will show Elastic DRS is enabled on the cluster.

To further fine-tune Elastic DRS, you have the choice of clicking on the green message box at the top right of your screen or select the option “Edit EDRS settings” in the bottom of your screen.

The next step is to select the Scaling Policy. As mentioned, EDRS provides two scaling policy that optimizes for cost or performance. In this screen, you can fine-tune the behavior of Elastic DRS or disable it if you want to keep the host count of your cluster at a static level.

Please note that if you would select the default settings, you give EDRS the permission to scale up to 16 physical ESXi nodes. If this number of ESXi hosts is too high for you, please adjust the maximum cluster size.

EDRS scales up per single node and evaluates the current workload, it uses a time window of 1 hour for evaluation. It does not add multiple hosts at once.

Elastic DRS is designed to adjust to your workload dynamically, it responds to the current demand and scales in and out in a more fluid way. If you are aware of a high volume of incoming workload, you can add multiple hosts to the cluster swiftly by logging into the console, select the cluster to scale out and select add hosts.

You can finetune EDRS by using PowerCLI. Kyle Ruddy will publish an article containing the PowerCLI commands shortly.

Filed Under: VMware

Resource Pools and Sibling Rivalry

July 16, 2018 by frankdenneman

One of the most powerful constructs in the Software Defined Data Center is the resource pool. The resource pool allows you to abstract and isolate cluster compute resources. Unfortunately, it’s mostly misunderstood and it received a bad rep in the past that it cannot get rid off.
One of the challenges of resource pools is to fully commit to resource pools. Placing virtual machines next to resource pools can have an impact of resource distribution. This article zooms in on sibling rivalry.
But before this adventure begins, I would like to stress that the examples provided in the article are a worst-case scenario. In this scenario, all VMs are 100% active. An uncommon situation, but it helps to easily explain the resource distribution. Later in the article, I use a few examples, in which some VMs are active and some are idle. And as you will see, resource pools aren’t that bad after all.
Resource Pool Size
Because resource pool shares are relative to other resource pools or virtual machines with the same parent resource pool, it is important to understand how vCenter sizes resource pools.
The values of CPU and memory shares applied to resource pools are similar to virtual machines. By default, a resource pool is sized like a virtual machine with 4 vCPUs and 16GB of RAM. Depending on the selected share level, a predefined number of shares are issued. Similar to VMs, four share levels can be selected. There are three predefined settings: High, Normal or Low, which specify share values with a 4:2:1 ratio, and the Custom setting, which can be used to specify a different relative relationship.

Share Level Shares of CPU Shares of Memory
Low 2000 81920
Normal 4000 163 840
High 8000 327 680

Caution must be taken when placing VMs at the same hierarchical level as resource pools, as VMs can end up with a higher priority than intended. For example, in vSphere 6.7, the largest virtual machine can be equipped with 128 vCPUs and 6 TB of memory. A 128vCPU and 6TB VM owns 256 000 (128 x 2000) CPU shares and 122 560 000 (6 128 000 x 20) memory shares. Comparing these two results in a CPU ratio is 32:1 and memory 374:1. The previous is an extreme example, but the reality is that 16GB and 4 vCPU VM is not uncommon anymore. Placing such a VM next to a resource pool results in unfair sibling rivalry.
The Family Tree of Resource Consumers
As shares determine the priority of the resource pool or virtual machine relative to its siblings, it is important to determine which objects compete for priority.

In the scenario depicted above, multiple sibling levels are present. VM01 and Resource Pool-1 are child objects of the cluster and therefore are on the same sibling level. VM02 and VM03 are child objects of Resource Pool-1. VM02 and VM03 are siblings, and both compete for resources provided by Resource Pool-1. DRS compares their share values to each other. The share values of VM01 and the other two VMs cannot be compared with each other because they each have different parents and thus do not experience sibling rivalry.
Shares indicate the priority at that particular hierarchical level, but the relative priority of the parent at its level determines the availability of the total amount of resources.
VM01 is a 2-vCPU 8GB virtual machine. The share value of Resource Pool-1 is set to high. As a result, the resource pool owns 8000 shares of CPU. The share value of VM01 is set to Normal and thus it owns 2000 CPU shares. Contention occurs, and the cluster distributes its resources between Resource Pool-1 and VM01. If both VM02 and VM03 are 100% utilized, Resource Pool-1 receives 80% of the cluster resources based on its share value.

Resource Pool-1 divides its resources between VM02 and VM03. Both child-objects own an equal number of shares and therefore receive each 50% of the resources of Resource Pool-1

This 50% of Resource Pool-1 resources equals to 40% of the cluster resources. As for now, both VM02 and VM03 are able to receive more resources than VM-1. However, three additional VMs are placed inside Resource Pool-1. The new VMs own each 2000 CPU shares, increasing the total number of outstanding shares to 10.000.

The distribution at the first level remains the same during contention. The cluster distributes its resources amongst its child-object, VM01 and Resource Pool-1; 20% to VM01 and 80% to Resource Pool-1. Please note this only occurs when all objects are generating 100% utilized.
If VM01 was generating 50% of its load and the VMs in Resource Pool-1 are 100% utilized, the cluster would flow the unused resources to the resource pool to satisfy the demand of its child objects.
The dynamic entitlement is adjusted to the actual demand. The VMs inside RP-1 are equally active, as a result of the reduced activity of VM01, they each receive 2% more resources.

VM02, VM03, and VM04 start to idle. The Resource Pool shifts the entitlement and allocates the cluster resources to the VMs that are active, VM05 and VM06. They each get 50% of 80% of the cluster resources due to their sibling rivalry.

Share Levels are Pre-sets, not Classes
A VM that is placed inside the resource pool, or created in a resource pool, does not inherit the share level of the resource pool. When creating a VM or a resource pool, vCenter assigns the Normal share level by default, independent of the share level of its parent.
Think of share levels as presets of share values. Configure a resource pool or virtual machine with the share-level set to high, and it gets 2000 CPU shares per vCPU. A VM configured with the share level set to low gets 500 CPU shares. If the VM has 4 vCPUs, the VM owns the same number of shares than the 1 vCPU with a share value set to high. Both compete with each other based on share amounts, not based on share level values.
Next Article
This article is a primer for a question about which direction we should take with Resource Pools. This week I shall post a follow up article that zooms in on the possible changes of shares behavior of resource pool in the near future. Stay tuned.

Filed Under: DRS, VMware

Virtually Speaking Podcast about Technical Writing

July 11, 2018 by frankdenneman

Last week Duncan and I were guests on the ever popular Virtually Speaking Podcast. In this show we discussed the difference in technical writing, i.e., writing a blog post versus writing a book. We spoke a lot about the challenges of writing a book and the importance of a supporting cast. We received a lot of great feedback on social media, and Pete told me the episode was downloaded more than a 1000 times in the first 24 hours. I think this is especially impressive as he published the podcast on a Saturday Afternoon. Due to this popularity, I thought it might be cool to share the episode in case you missed the announcement.
Pete and John shared the links to our VMworld sessions on this page. During the show, I mentioned the VMworld session of Katarina Wagnerova and Mark Brookfield. If you go to VMworld, I would recommend attending this session. It’s always interesting to hear people talk about how they designed an environment and dealt with problems in a very isolated place on earth.
Enjoy listening to the show.

Filed Under: VMware Tagged With: Podcast, vSphere

Resource Consumption of Encrypted vMotion

June 26, 2018 by frankdenneman

vSphere 6.5 introduced encrypted vMotion and encrypts vMotion traffic if the destination and source host are capable of supporting encrypted vMotion. If true, vMotion traffic consumes more CPU cycles on both the source and destination host. This article zooms in on the impact of CPU consumption of encrypted vMotion on the vSphere cluster and how DRS leverages this new(ish) technology.

CPU Consumption of vMotion Process

ESXi reserves CPU resources on both the destination and the source host to ensure vMotion can consume the available bandwidth. ESXi only takes the number of vMotion NICs, and their respective speed into account, the number of vMotion operations does not affect the total of CPU resources reserved! 10% of a CPU core for a 1 GbE NIC, 100% of a CPU core for a 10 GbE NIC. vMotion is configured with a minimum reservation of 30%. Therefore, if you have 1 GbE NIC configured for vMotion, it reserves at least 30% of a single core.

Encrypted vMotion

As mentioned, vSphere 6.5 introduced encrypted vMotion and by doing so it also introduced a new stream channel architecture. When an encrypted vMotion process is started, 3 stream channels are created. Prepare, Encrypt and Transmit. The encryption and decryption process consumes CPU cycles and to reduce the overhead as much as possible, the encrypted vMotion process uses the AES-NI Instruction set of the physical CPU.
AES-NI stands for Advanced Encryption Standard- New Instruction and was introduced in the Intel Westmere-EP generation (2010) and AMD Bulldozer (2011). It’s safe to say that most data centers run on AES-NI equipped CPUs. However, if the source or destination host is not equipped with AES-NI, vMotion automatically reverts to unencrypted if the default setting is selected.
Although the encrypted vMotion leverages special CPU hardware instructions set to offload overhead, it does increase the CPU utilization. The technical paper “VMware vSphere Encrypted vMotion Architecture, Performance, and Best Practices” published by VMware list the overhead on the source and destination host.

Encrypted vMotion CPU Overhead on the Source Host

Encrypted vMotion CPU Overhead on the Destination Host

Encrypted vMotion is a per-VM setting, by default, every VM is configured with Encrypted vMotion set to Opportunistic. The three settings are:
VM Options Encrypted vMotion

Setting Behavior
Disabled Does not use encrypted vMotion
Opportunistic Use encrypted vMotion if the source and destination host supports it. Only vSphere 6.5 and later use encrypted vMotion
Required Only allow encrypted vMotion. If the source and destination host does not support encrypted vMotion, migration with vMotion is not allowed.

Please be aware that encrypted vMotion settings are transparent to DRS. DRS generates a load balancing migration, and when the vMotion process starts, the vMotion process verifies the requirements. Due to the transparency, DRS does not take encrypted vMotion settings and host compatibility into account when generating a recommendation.
If you select required, because of security standards, it is important to understand if you are running a heterogeneous cluster with various vSphere versions. Is every host in your cluster 6.5 otherwise you are impacting the ability of DRS to load-balance optimally. Or are different types of CPU generations inside the cluster, do they support AES-NI? Please make sure the BIOS version supports AES-NI and make sure AES-NI is enabled in the BIOS! Also, verify if the applied Enhanced vMotion Compatibility (EVC) baseline exposes AES-NI.

CPU Headroom

It is important to keep some unreserved and unallocated CPU resources available for the vMotion process, to avoid creating gridlock. DRS needs some resources to run its threads, and vMotion requires resources to move VMs to lesser utilized ESXi host.  Know that encrypted vMotion taxes the system more,  in oversaturated clusters, it might be interesting to understand whether your security officer state encrypted vMotion as a requirement.

Filed Under: VMware

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 17
  • Page 18
  • Page 19
  • Page 20
  • Page 21
  • Interim pages omitted …
  • Page 89
  • Go to Next Page »

Copyright © 2025 · SquareOne Theme on Genesis Framework · WordPress · Log in