VMWARE CLOUD ON AWS ON VIRTUALLY SPEAKING PODCAST
Last week I had the pleasure of connecting again with my friends and colleagues Pete Flecha a.k.a PedroArrow and eternal sunshine John Nicholson. During the podcast, we discussed the road to Hybrid cloud, cloud mobility, multi-cloud operations, and the necessity of replatforming apps or not. It’s always fun hanging out with these guys especially when talking about cool things. Hope you enjoy the show as much as I did.
AMD EPYC AND VSPHERE VNUMA
AMD is gaining popularity in the server market with the EPYC CPU platform. The EPYC CPU platform provides a high core count and a large memory capacity. If you are familiar with previous AMD generations, you know AMD’s method of operation is different than Intel’s. For reference, take a look at the article I wrote in 2011 about the 12-core 6100 Opteron code name Magny-Cours. EPYC provides an increase of scale but builds on the previously introduced principles. Let’s review the EPYC architecture and see how it can impact your VM sizing and ESXi configuration. (Please note that this article is NOT intended as a good/bad comparison between AMD and Intel, I’m just describing the architectural differences).
KUBERNETES, SWAP AND THE VMWARE BALLOON DRIVER
Kubernetes requires to disable the swap file at the OS level. As stated in the 1.8 release changelog: The kubelet now fails if swap is enabled on a node. Why disable swap? Turning off swap doesn’t mean you are unable to create memory pressure. Why disable such a benevolent tool? Disable swap doesn’t make any sense if you look at it from a single workload, single system perspective. However, Kubernetes is a distributed system that is designed to operate at scale. When running a large number of containers on a vast fleet of machines, you want predictability and consistency. Disabling swap is the right approach. It’s better to kill a single container than to have multiple containers run on a machine at unpredictable, probably slow, rate.
FREE VSPHERE CLUSTERING DEEP DIVE BOOK AT VMWORLD EUROPE
Last year Rubrik gave away hard copies of the vSphere Host Deep Dive book, this year they are doing it again with the vSphere 6.7 Clustering Deep Dive Book. Come by the Rubrik Booth #P305 on Tuesday from 4:00 PM - 5:00 PM to get a signed, complimentary copy of vSphere 6.7 Clustering Deep Dive and meet the authors. Last year we gave away a thousand copies and were gone within an hour. As most of you can remember, the line was insane. This year we have a similar amount, so make sure you’re on time.
KUBERNETES AT VMWORLD EUROPE
With only a few days left until VMworld Europe 2018 kicks off in Barcelona, I would like to highlight some of the many Kubernetes focussed sessions. I’ve selected a bunch of breakout sessions and meet the expert sessions based on my exposure to them at VMworld US or the quality of the speaker. The content catalog has marked some sessions as “at capacity”, but experience thought us that there are always a couple of no-shows. Plans change during VMworld. People register for a session they would like to attend but get pulled in an interesting conversation along the way. Or sometimes you suffer from information overload and want to catch a breather. In many cases, spots open at sold-out sessions and therefore it’s always recommended to walk up to sold out sessions and try your luck.
REPEAT SESSION VSPHERE CLUSTERING DEEP DIVE AT VMWORLD EUROPE
Good news for the VMworld attendees who couldn’t sign up anymore for the vSphere Clustering Deep Dive session on Tuesday. I’m happy to announce that the VMworld team scheduled a repeat session for the vSphere Clustering Deep Dive session on Thursday 08 November at 10:30 to 11:30. Session Outline In this session, Duncan and Frank will take you through the trenches of VMware vSphere Distributed Resource Scheduler (DRS) and vSphere High Availability (HA). Find out about options to optimize your DRS settings for your specific requirements and goals, such as if you should be load balancing on active or consumed memory, as well as what has recently changed in the DRS algorithm and if it will impact DRS behavior. And for vSphere HA, you will learn about when it restarts virtual machines (VMs), what kind of restart times to expect, and where you can find evidence that a VM (or multiple) have been restarted. You will find out about all of these items and more. Prepare to dive deep, as the basics will not be covered. Don’t wait too long with registering, VMworld Europe room sizes max out at 400 people. We hope to see you there!
COMPUTE POLICY IN VMWARE CLOUD ON AWS
The latest update of VMware Cloud on AWS introduced a new feature called compute policies. In its initial release, the compute policies provide the ability to configure affinity rules and mobility control based of declarative policies and vSphere tags. Management of affinity rules Historically, affinity rules are a part of the cluster configuration. Within VMware Cloud on AWS, cluster configuration is controlled by VMware and thus customers cannot set affinity rules for virtual machines running within the SDDC. Instead of merely pulling the affinity rules configuration outside the cluster configuration, we decided to improve the affinity functionality and work towards a more uniform and consistent experience across multiple clouds. The road to declarative policies Within a declarative system, you describe what you want to happen. This is the opposite of imperative operations where you specify actions. Declarative commands define state and to some extent affinity rules are declarative statements. Let’s take VM anti-affinity rules as an example. You want to keep VM1 and VM2 separated and keep them in different fault domains. Instead of providing imperative actions of pinning VM1 to host A and pinning VM2 to host B, you create an anti-affinity rule with VM1 and VM2 as members. You state that these two VMs should not run on the same ESXi host. vCenter (DRS) controls placement and takes the necessary actions to solve any violations of this intent. We want to apply this model to other features. Instead of logging into vCenter to deal with configuration issues, and manually correct the situation, we want vCenter to manage the functions of your behalf. The way you interact with vCenter, in this more declarative way, is with policies. Instead of specifying more detailed imperative actions, you would declare your intent and the only thing you want to monitor after that is whether the policy is compliant or not. We have to start somewhere, thus we concentrated on affinity rules (VM-VM and VM-host) and anti-mobility (vMotion disabled) policies. Once we have this more abstract way of interacting with vCenter Server, it provides more advantages. One of them is an additional level of abstraction. And abstraction allows for a more uniform and consistent experience across multiple clouds. With today’s ability on-prem setup, you configure your cluster for a particular workload and this could inhibit the ability to move your workload to another cluster, on-prem or even to the cloud. To make sure you can easily burst out to VMware Cloud environments, you want this to be seamless. The directions where we are going to is that you do not need to have configurations that are specific to on-prem clusters and in-cloud or at-edge clusters. But ideally you express what you want and it should be the job of the cloud control plane, such as vCenter, to push this configuration to the environment the workload is presently in. So that could be to an on-prem cluster or an in-cloud cluster. Compute policies are active at vCenter level Due to this model, the rules are decoupled from cluster level and are now managed at vCenter level. If you would configure a VM-VM anti-affinity rule and you would move the VMs to another cluster, the policy remains active. At the time of writing, VMware Cloud on AWS allows the customer to create 10 clusters per SDDC. Clusters can span multiple AWS availability zones (AZs). The VM-Host affinity ruleset allows customers to tag the hosts per AZ and tag the VMs that needs to remain in that availability zone. You can move the VMs to hosts between clusters within the same AZ, the compute policy remains active while vCenter ensures the compliance of the rule. Introduction of firm rules An interesting fact is that the VM-Host rules are firm rules, these firm rules differ from the traditional soft (should run on) and hard (must run on). They sit in between these two rules. DRS cannot violate these rules, only if the host is placed in maintenance mode. This ensures that during normal operations the rules are never broken while providing VMware the ability to service the SDDC. The only time a host is placed into maintenance mode in VMware Cloud on AWS is during upgrades which are handled by VMware and well communicated before the service window. This allows the customer to generate a strategy for these virtual machines well ahead before the service window. In the next article, I will go through the steps on how to create a compute policy.
MY NEW ROLE
A couple of months ago I joined the Office of the CTO of the Cloud Platform Business Unit and started reporting directly to the CTO, Kit Colbert. Kit asked me to select a few areas to focus on. One of these areas is running Kubernetes on vSphere. I’ve increased my focus on Kubernetes, as this architecture becomes increasingly important in the datacenter. When talking to customers, two questions I ask is, what is the current ratio of VMs to containers in your data center and what is the most popular format of deployment today? The common response is respectively 90% VMs and net-new is 90% containers. Today’s trend moves away from installing shrink-wrapped software and more towards custom building revenue-critical applications by their development teams. The standard tool for developers is container-based infrastructure. Kubernetes is the defacto choice of orchestration of containers and consists of many infrastructure-focused options. The operations that interest me are the high availability and resource management operations. It appears these operations replace HA and DRS processes when glancing over them, but when looking more closely they strongly augment each other. At VMworld in Las Vegas, Michael Gasch and I presented the session “Deep Dive: The Value of Running Kubernetes on vSphere” (CNA1553BU). If you are not going to VMworld Europe, I recommend watching the video recording, if you are going to VMworld Europe I recommend you to sign up. One thing you can expect from me is more Kubernetes focused articles. One of the things that I noticed is that many articles are written by cloud-native natives for cloud-native natives. I.e. they rely on extensive previous exposure to this ecosystem. I’m trying to cover some of the challenges I have faced and the quirks I notice as a “newcomer”.
HELP US MAKE VMOTION EVEN BETTER
The vMotion product team is looking for input on how to improve vMotion. vMotion has proven to be a paradigm shift of datacenter management. Workload mobility is a must-have requirement in today’s datacenter operational model. vMotion handles the majority of workload flawlessly. However, there are some corner cases that introduce some challenges. The vMotion product team is interested in these corner cases, to improve the vMotion architecture bringing workload mobility to all workloads everywhere.
TERMINAL AFFINITY POLL
We are looking into the combination of licensed workload and hard-affinity rules (Must run on rule). If you deploy this in your environment right now, how do you deal with this during maintenance hours? Your input helps in shaping future features. (Scroll down in the survey window to access the done button to submit your response)