VSPHERE 8 CPU TOPOLOGY FOR LARGE MEMORY FOOTPRINT VMS EXCEEDING NUMA BOUNDARIES

By default, vSphere manages the vCPU configuration and vNUMA topology automatically. vSphere attempts to keep the VM within a NUMA node until the vCPU count of that VM exceeds the number of physical cores inside a single CPU socket of that particular host. For example, my lab has dual-socket ESXi host configurations, and each host has 20 processor cores per socket. As a result, vSphere creates a VM with a vCPU topology with a unified memory address (UMA) up to the vCPU count of 20. Once I assign 21 vCPU, it creates a vNUMA topology with two virtual NUMA nodes and exposes this to the guest OS for further memory optimization.

UNEXPLORED TERRITORY PODCAST EP30 - PROJECT KESWICK WITH ALAN RENOUF

While preparing the podcast, I knew this episode would be good. Edge technology immensely excites me, and the way the project team strays away from the proverbial hammer and looks at ways to incorporate different principles like Gitops management concepts is inspiring. To top it off, you have Alan Renouf to talk about it, a long-time colleague and friend, but unfortunately, Covid prohibited me from partaking in this discussion. But, of course, Duncan and Johan had an excellent conversation with Alan. Please check it out on Spotify, Apple, or via our website. Enjoy!

VSPHERE 8 CPU TOPOLOGY DEVICE ASSIGNMENT

There seems to be some misunderstanding about the new vSphere 8 CPU Topology Device Assignment feature, and I hope this article will help you understand (when to use) this feature. This feature defines the mapping of the virtual PCIe device to the vNUMA topology. The main purpose is to optimize guest OS and application optimization. This setting does not impact NUMA affinity and scheduling of vCPU and memory locality at the physical resource layer. This is based on the VM placement policy (best effort). Let’s explore the settings and their effect on the virtual machine. Let’s go over the basics first. The feature is located in the VM Options menu of the virtual machine. 

COULD NOT INITIALIZE PLUGIN ‘LIBNVIDIA-VGX.SO - CHECK SR-IOV IN THE BIOS

I was building a new lab with some NVIDIA A30 GPUs in a few hosts, and after installing the NVIDIA driver onto the ESXi host, I got the following error when powering up a VM with a vGPU profile: Typically that means three things: Shared Direct passthrough is not enabled on the GPU ECC memory is enabled VM Memory reservation was not set to protect its full memory range/ But shared direct passthrough was enabled, and because I was using a C-type profile and an NVIDIA A30 GPU, I did not have to disable ECC memory. According to the NVIDIA Virtual GPU software documentation: 3.4 Disabling and Enabling ECC Memory

SUB-NUMA CLUSTERING

I’m noticing a trend that more ESXi hosts have Sub-NUMA Clustering enabled. Typically this setting is used in the High-Performance Computing space or Telco world, where they need to reduce every last millisecond of latency and squeeze out every bit of bandwidth the system can offer. Such workloads are mostly highly tuned and operate in a controlled environment, whereas an ESXi server generally runs a collection of every type of workload in the organization imaginable. Let’s explore what Sub-NUMA clustering does and see whether it makes sense if you should enable it in your environment.

VMWARE SESSIONS AT NVIDIA GTC

Overcome Your AI/ML Challenges with VMware + NVIDIA AI-Ready Enterprise Platform (Presented by VMware, Inc.) [A41422] Tuesday, Sep 20, 8:00 PM - 9:00 PM CEST / 11:00 Pacific Time (PT) Shobhit Bhutani, Justin Murray, and I are honored to present at NVIDIA GTC. In the [session](http://Overcome Your AI/ML Challenges with VMware + NVIDIA AI-Ready Enterprise Platform (Presented by VMware, Inc.) [A41422]), we provide a deep-level overview of the VMware and NVIDIA AI-Ready Enterprise platform and the new ML-focused features of the vSphere 8 release. 

VSPHERE 8 AND VSAN 8 UNEXPLORED TERRITORY PODCAST DOUBLE HEADER

This week we released two episodes covering the vSphere 8 and vSan 8 releases. Together with Feidhlim O’Leary, we discover all the new functions and features of the vSphere 8 platform. You can listen to this episode on Spotify, Apple, or on our website: unexploredterritory.tech Pete Koehler repeats his stellar performance of last time and helps us understand the completely new architecture of vSAN 8. You can listen to this episode on Spotify, Apple, or on our website: unexploredterritory.tech or anywhere else you get your podcasts!

UNEXPLORED TERRITORY - VMWARE EXPLORE USA SPECIAL

This week Duncan and I attended VMware Explore to co-present the session “60 Minutes of Virtually Speaking Live: Accelerating Cloud Transformation.” with William Lam and our buddies of the Virtually Speaking Podcast, Pete Flecha and John Nicholson. The recordings should be made available soon or sign up for the session in Barcelona. During the week, we caught up with many people and captured soundbites of people such as Kit Colbert, Chris Wolf, Stephen Foskett, Sazalla Reddy, and a few more. You can listen to this special VMware Explore episode on Spotify (spoti.fi/3cITI7p), Apple (apple.co/3q35dJJ) or our website: unexploredterritory.tech/episodes/

NEW VSPHERE 8 FEATURES FOR CONSISTENT ML WORKLOAD PERFORMANCE

vSphere 8 is full of enhancements. Go to blogs.vmware.com or yellow-bricks.com for more extensive overviews of the vSphere 8 release. In this article, I want to highlight two features of the new vSphere 8 version that will help machine learning (ML) workloads perform consistently and possibly faster than manually configured workload constructs. The two features which make this possible are UI enhancements for the vNUMA Topology and the Device Groups. Hardware 20 Scalability Enhancements Before we dive into the features, vSphere 8 introduces a new virtual hardware feature that allows us to introduce new wonderful things and push the boundaries again of the platform. With vSphere 8, the virtual hardware level advances to version 20, and it offers new capabilities for ML accelerators. The support for DirectPath I/O devices went up from 16 to 32. We also worked with NVIDIA to increase the support of the vGPU devices, and now, with vSphere 8, each ESXi host can support up to 8 vGPU devices. 

TRAINING VS INFERENCE - NETWORK COMPRESSION

This training versus inference workload series provides platform architects and owners insights about ML workload characteristics. Instead of treating deep neural networks as black box workloads, ML architectures and techniques are covered with infrastructure experts in mind. A better comprehension of the workload opens up the dialog between infrastructure and data science teams, hopefully resulting in better matching workload requirements and platform capabilities. Part 3 of the series focussed on the memory consumption of deep learning neural network architectures. It introduced the different types of operands (weights, activations, gradients) and how each consumes memory and requires computational power throughout the different stages and layers of the neural network. Part 4 showed that a floating point data type impacts a neural network’s memory consumption and computational power requirement. I want to cover neural network compression in this part of the training versus inference workload deep dive. The goal of neural network compression is inference optimization to either help fit and run a model at a constrained endpoint or to reduce inference infrastructure running costs.