DECOUPLING OF CORES PER SOCKET FROM VIRTUAL NUMA TOPOLOGY IN VSPHERE 6.5

Some changes are made in ESXi 6.5 with regards to sizing and configuration of the virtual NUMA topology of a VM. A big step forward in improving performance is the decoupling of Cores per Socket setting from the virtual NUMA topology sizing. Understanding elemental behavior is crucial for building a stable, consistent and proper performing infrastructure. If you are using VMs with a non-default Cores per Socket setting and planning to upgrade to ESXi 6.5, please read this article, as you might want to set a host advanced settings before migrating VMs between ESXi hosts. More details about this setting is located at the end of the article, but let’s start by understanding how the CPU setting Cores per Socket impacts the sizing of the virtual NUMA topology.

VMWARE CLOUD ON AWS AT RE:INVENT

Thank you for all the great feedback since we announced our partnership with Amazon Web Services (AWS) on October 13! We have seen a lot of interest for VMware Cloud on AWS (VMC) from customers, partners, industry analysts, and social media. Following on from the announcement in San Francisco, we went on to Barcelona for VMworld Europe, and had multiple sold out sessions with our customers and partners in attendance. The #VMWonAWS hashtag on Twitter was pretty active as well, and we had our hands full answering all your questions! Next stop is AWS re:Invent in Las Vegas, unfortunately I won’t be at re:Invent, but the core group of VMware on AWS cloud product team is. VMware is a Platinum Sponsor at re:Invent and the team to eager to talk about the service offering, use cases, architecture, Tech Preview demos and a lot more. Here are the top three ways to get the most out of re:Invent: ENT317 - VMware and AWS Together - VMware Cloud on AWS Thursday, Dec 1, 2:00 PM - 3:00 PM Location: Venetian Level 3, Murano 3205 (please check exact location on the portal) Speakers: Matt Dreyer, VMware Product Management, Paul Bockelman - AWS Sr. Solutions Architect Description: VMware CloudTM on AWS brings VMware’s enterprise class Software-Defined Data Center software to Amazon’s public cloud, delivered as an on-demand, elastically scalable, cloud-based VMware sold, operated and supported service for any application and optimized for next-generation, elastic, bare metal AWS infrastructure. This solution enables customers to use a common set of software and tools to manage both their AWS-based and on-premises vSphere resources consistently. Further virtual machines in this environment have seamless access to the broad range of AWS services as well. This session will introduce this exciting new service and examine some of the use cases and benefits of the service. The session will also include a VMware Tech Preview that demonstrates standing up a complete SDDC cluster on AWS and various operations using standard tools like vCenter. PTS205 - VMware Cloud on AWS Wednesday, Nov 30, 1:30 PM - 1:45 PM Location: Partner Theater - Expo Hall Speaker: Marc Umeno, VMware Product Management Description: Learn about how VMware and AWS are joining hands to deliver a new vSphere-based service running on next-generation, elastic, bare-metal AWS infrastructure with seamless integration with AWS services. VMware Booth 2525 at Sands Expo, Hall D. Full exhibitor list and map is here Tue Nov 29th 5-7 pm ; Wed Nov 30th 10:30am-6pm ; Thu Dec 1st 10:30am-6pm Description: We have three demo pods: a) VMware Cloud on AWS, b) Networking & Security, and c) Cloud Management Beyond the sessions and booth, you can also engage with us using the following means: Sign up for [Beta, news updates, or both](http://Sign up for Beta, news updates) Follow us on Twitter @vmwarecloud Ask a question, share a use case, or just give us a shout out using #VMWonAWS hashtag Pick a 30 minute slot to talk to a member of the VMware Cloud on AWS product team 1:1 Thank you and we hope to see you there! https://www.youtube.com/watch?v=wBai31IywlI&t=49s

MY THOUGHT ON 800 PAGE VCDX DESIGNS

Although I’m not participating in the VCDX program any more, I still hold it dear to my heart. Many aspiring VCDX’es approach me and seek guidance on how to successfully pass the last part of the VCDX process, the defense. Typically this starts with the discussion on the design itself and particularly how many pages the design should be comprised off. I heard stories about people advocating 800 page designs. And that makes me laugh, but mostly cry. Let’s go back to the essence of the program and understand that the VCDX program has been erected with the idea to validate that someone is a skilled architect. That they can assist IT-organizations into building a successful vSphere architecture. In short, it’s just a stamp of approval of your skill as an architect. Now with that in mind, how many skilled architects hand in an 800 page vSphere design document to a customer? How many customers would accept that? We are not in the business of writing the next Lord of the Rings novel. I worked on complex and massive architectures and most designs didn’t touch 150 pages. When reviewing such 800 page designs, I noticed it’s more a cut and paste of official documentation on how a certain features work. It’s imperative that you know the inner workings of the pillars and foundation of your architecture. But your design should not be a thesis or a showcase of your knowledge of the products. A design should highlight the requirements, the constraints and the chosen direction and technology. It should explain the workings of the used technology in a short and concise manner. Explain how this technology meet the customer requirements and if certain constraints require you to deviate from the default settings. Document thoroughly the effect the chose design on the service levels of the applications and architecture. I feel that some people try to portray the defense as this herculean feat. And to be honest, if you haven’t operated as an architect for multiple customers, it might feel that way. But if you are the architect that has worked on multiple designs, that recognizes the risk-awareness culture differences between companies and how to cater to this need. That can drill down to the essence and explain why a certain requirement impacts a design decision and what effect this has on service levels or other requirements you should be fine! Try to not to see it as the Mount Everest of your career, see it passing the defense as ceremony that validates your upward path of being a great architect. Do what you’ve always have been doing. If you provided your customers with 100 to 200 page designs, keep on doing that and submit such a design for your VCDX defense.

HOST NOT READY ERROR WHEN INSTALLING NSX AGENTS

Management summary: Make sure your NSX Controller is connected a distributed vSwitch instead of the standard vSwitch During the install process of NSX, my environment refused to install the NSX agents on the host. When you prepare the host clusters for network virtualization a collection of VIBs are installed on each ESXi Node of the selected cluster. This process installs the functionality such as Distributed Routing, Distributed Firewalls and the user world agent that allows the distributed vSwitch to evolve into a NSX Virtual Switch. Unfortunately, this process didn’t go as smooth as the other processes such as installing the NSX Manager and deploying the NSX Controller. Each time I selected Install at Host Preparation, (Within vCenter, select Networking & Security > Installation > Host Preparation. Select the cluster and click the Install link) the process returned an error “Host Not Ready”.The recent task view showed that the task cannot be completed Events shows the following entry: Not very helpful in order to troubleshoot the error. I followed the KB article 2075600 (Installation Status appears as Not Ready in NSX (2075600), and made sure time and DNS were set up correctly. But unfortunately, it didn’t solve the problem. Until I started to dissect the process of what Install at the Host Preparation actually does and how the components connect to each other. This made me review the settings of the NSX Manager and discovered I selected the port group designated for my management VMs on the standard switch instead of the distributed switch. It makes sense to connect it to a Distributed Switch, maybe this is the reason why many write-ups on how to install NSX assume this is basically knowledge and fail to list it as a requirement. The UI allows you to select a standard vSwitch Port Group or a Distributed Port Group. Don’t make the same mistake I made and make sure you select the appropriate Distributed Port Group.

VMWARE CLOUD ON AWS - ELASTIC DRS PREVIEW

The VMworld Europe keynote featured the future VMware Cloud on AWS services. In short this services gives VMware customers instant scale and global reach delivered by AWS while continuing to use their own skill set driving and operating VMware SDDC environments on-prem and in-cloud. Avoid the risk that comes with re-platforming, re-architecting current application landscape to run on a different platform while providing the same service. In turn it allows the IT organization to connect the current applications with AWS vast service catalog and use services like RDS, Red Shift, Glacier and many more. One of the interesting features that is under tech preview is Elastic DRS. Elastic DRS helps to solve one of the toughest challenges an IT architect can face: capacity planning. Major key points of capacity planning are current and future resource demand, failure recovery capacity and maintenance capacity. Finding the right balance between maintaining workload performance versus the downside of CAPEX and OPEX of reserved failover capacity is difficult. By leveraging the IT-at-scale operations of AWS, Elastic DRS transforms vSphere clusters into an agility powerhouse. Rapid scaling ability allows to add additional hosts to the cluster. No more ordering new hardware, racking and stacking, just add the new host to the cluster with a right-click of the mouse. By using native metrics, DRS can detect that the cluster is running out of host resources and presents a recommendation of adding another host. Like regular DRS, you can also put Elastic DRS into automatic mode and allow it to add or remove hosts based on observed load on the cluster. Sometimes we forget how extremely complex running IT at super scale is. Automating the install, configuration and operaing one host is interesting, doing this by the dozen is already pushing the limits for a lot of IT organizations. Now think about this doing it in more than a dozen datacenters around the world at the same time while being required to do it instantly when a customer wants this. Undeniably impressive. When joining the team, learning about Elastic DRS was exciting, understanding how this works for all the customers on all the AWS datacenters around the world is just mind-blowing! IT-at-Scale to its finest. When you have ready-to-go ESXi hosts at your fingertips it allows you to do so many cool things , for example allow DRS to aid and assist vSphere HA. Since ESXi 3.0, vSphere HA has ensured that workloads are restarted on the surviving hosts in the cluster. However, when a host outage is not temporary, but permanently, application performance can be impacted due to the reduction of available host resources on a longer term. Auto remediation helps to address this challenge. Auto remediation builds upon Elastic DRS and ensures that the available host resources remain consistent during an ESXi host outage. When a host failure is detected, auto remediation adds another hosts to the cluster, ensuring that the workload performance will not be impacted in the long run by a host failure. If partial (hardware) failure occurs, auto remediation ensures that VSAN operations complete before ejecting the degraded host. Another benefit of this framework is the ability to retain similar levels of resources during maintenance. Typically during maintenance operations, hosts are patched and temporarily unavailable to run and service applications. Many IT organizations deal with this situation, by either “oversizing” cluster or by offering SLA’s that provides a reduced service during maintenance hours. With Elastic DRS, the cluster size is not reduced during maintenance operations. This way workloads are not impacted by a loss of resources and continue to perform similarly as to normal operation hours. To emphasize this is a a technical preview of a service that is not operational yet. For more info about VMware Cloud on AWS, take a closer look.

VMWARE CLOUD™ ON AWS – A CLOSER LOOK

After a long time of keeping this silent, I can finally share a little bit what I’ve been focussing on at VMware. (This is a repost of content on blogs.vmware.com) Today, VMware and Amazon Web Services (AWS) are announcing a strategic partnership providing the ability to run a full VMware Software Defined Data Center (SDDC) as a cloud service on AWS. This service will include all the enterprise tools you’re familiar with including vSphere, ESXi, VSAN and NSX. This article provides a technical preview of the new service VMware Cloud on AWS (VMC), allowing me to give you a sneak peak of the incredibly cool stuff that is coming. This architecture is a match made in heaven if you ask me. It allows administrators and architects that are used to vSphere to leverage the agility of AWS without re-architecting applications and reconstructing operational procedures. One great advantage is that vCenter will be the main platform of operations, therefore all tools that you currently run against vCenter in your on-premises vSphere deployment will work with the in-cloud SDDC environment. All these tools and functionalities that have been developed over the years are now coming together and provide an environment that allows workload mobility between clouds while pushing data center agility to new levels. In short, once signed up, select a cluster size and a SDDC environment is created for you in a very short time. To emphasize (and to avoid any misconception), the VMware cloud will run on native ESX on next-generation, bare metal AWS Infrastructure. The VMware cloud will be deployed as a private cloud containing vSphere ESXi hosts, VSAN and NSX on AWS infrastructure. This will allow you to run enterprise workloads with the same performance, reliability and availability levels as your on-premises vSphere deployments but now on an AWS architecture. The main difference between the on-prem and in-cloud deployment is that VMware manages and operates the infrastructure of the VMware Cloud on AWS. It is important to note here that this is a fully managed service. That is to say, VMware will install, manage and maintain the underlying ESXi, VSAN, vCenter and NSX infrastructure. Routine operations like patching or hardware failure remediation will be taken care of by VMware as part of the service. Customers will have delegated permissions to things like vCenter and will be able to use vCenter to perform administrative tasks but there will be some actions like patching which VMware will provide to you as part of the service. This means that VMware takes care of the core infrastructure in partnership with AWS. VMware Cloud on AWS will be available as a stand-alone deployment, as a Hybrid cloud deployment or as a cloud-to-cloud deployment. With hybrid and cloud-to-cloud deployments, vCenter enhanced linked-mode provides a single pane of glass that assists IT operation teams to manage the SDDC deployments from a centralized console. NSX extends this single pane of glass by providing consistent network and security services between the various deployments. However, NSX is not a requirement! If you are not running NSX on premise right now, you will still be able to run VMware Cloud on AWS but you won’t be able to utilize the hybrid cloud features of NSX until you do. With the ability to span networks and clouds, vMotion provides workload mobility, allowing the movement of workloads in and out the various cloud deployments. Yes, you read that correctly, you can vMotion from your existing on-premises vSphere environment to AWS! One of the interesting concepts is elastic scaling. Elastic scaling would help to solve one of the toughest challenges an IT architect can face: capacity planning. Major key points of capacity planning are current and future resource demand, failure recovery capacity and maintenance capacity. Finding the right balance between maintaining workload performance versus the downside of CAPEX and OPEX of reserved failover capacity is difficult. Think about how elastic scaling would transform vSphere clusters into agile powerhouses. Instead of going through the tedious procuring and installing process yourself, benefit from the IT-at-scale mindset and services delivered by AWS. Since ESXi 4.0, vSphere HA has enabled workloads to restart the surviving hosts in the cluster. However, when a host outage is not temporary, host resources can become constrained due to the reduction of the available hosts. Auto-remediation can builds upon DR solutions ensuring available host resources remain consistent during an ESXi host outage. When a host failure is detected, auto-remediation adds other hosts to the cluster, ensuring that the workload performance will not be impacted in the long run by a host failure. If partial (hardware) failure occurs, auto-remediation ensures that VSAN operations complete before ejecting the degraded host. Another benefit of this framework is the ability to retain similar levels of resources during maintenance. During maintenance operations, the cluster size is not reduced, workloads are not impacted by a loss of resources and continue to perform similarly as to normal operation hours. I believe one of the strengths of VMware Cloud on AWS service is that it allows administrators, operation teams and architects to use their existing skill set and tools to consume AWS infrastructure. You can move workloads to the cloud without having to replatform them in any way, no conversion of virtual machines, no repackaging and very important no extensive testing, you just migrate the VM. Another strength it the ability to pair current workloads with the advanced feature set of AWS. As a result, IT teams will be able to extend their skill set discovering the vast catalog of services AWS has to offer. This creates an environment that works seamlessly with both on-premises private clouds and advanced AWS Public Cloud Services. There are so many other great features that I want to cover, but let’s save that for future articles.

THIS BLOG HAS BEEN HACKED BY VS0CIETY

Follow us @vS0ciety

VMWORLD GEEK WHISPERERS PODCAST - CHOOSING TITLES YOU WANT TO HAVE

Amy Lewis asked me to appear on the Geek Whisperers Live podcast at VMworld 2016 in Las Vegas. And as always I had a blast discussing various topics with Amy, Matt, and John. In this talk, we spoke about becoming an evangelist, what the challenges are as an evangelist and why you won’t want to pick the title of evangelist yourself. Of course, while interacting with this magnificent group of people you tend to talk about a lot more things. So go on and check it out, I had a blast doing it. http://geek-whisperers.com/2016/09/choosing-titles-you-want-to-have-wfrank-denneman-at-vmworld-2016-episode-120/

I'M COMING HOME

I’m excited to announce that I’ve accepted a position at VMware as Senior Staff Architect. I can’t share the details of this next-level product that I will be working on right now. But I look forward to sharing more information when the time is right. I cannot wait to get started. #GameOn!

NUMA DEEP DIVE PART 5: ESXI VMKERNEL NUMA CONSTRUCTS

ESXi Server is optimized for NUMA systems and contains a NUMA scheduler and a CPU scheduler. When ESXi runs on a NUMA platform, the VMkernel activates the NUMA scheduler. The primary role of the NUMA scheduler is to optimize the CPU and memory allocation of virtual machines by managing the initial placement and load balance virtual machine workloads dynamically across the NUMA nodes. Allocation of physical CPU resources to virtual machines is carried out by the CPU scheduler.