INITIAL PLACEMENT OF A VSPHERE POD

Project Pacific transforms vSphere into a unified application platform. This new platform runs both virtual machine and Linux containers as native workload constructs. Just introducing Linux containers as a new workload object is not enough. To manage containers properly, you need a legitimate orchestrator. And on top of that, you need to make sure that existing services, such as DRS, can handle the different lifecycles of these different objects. Containers typically have a shorter lifecycle than virtual machines, where VMs “live” for years, containers have a shorter life expectancy. And this massively different churn impacts initial placement and load-balancing operations of resource management services.

MULTI-GPU AND DISTRIBUTED DEEP LEARNING

More enterprises are incorporating machine learning (ML) into their operations, products, and services. Similar to other workloads, a hybrid-cloud model strategy is used for ML development and deployment. A common strategy is using the excellent toolset and training data offered by public cloud ML services for generic ML capabilities. These ML activities typically improve an organization’s quality of service and increase in productivity. But the real differentiation lies within using the organization’s unique data and know-how to create what’s called differentiated machine learning. The data used is primarily generated by own processes or through interaction with its customers. As a result, specific rules and regulations come into play when handling and storing that data. Another strong aspect of determining where to deploy ML activities is data gravity. Placing compute close to where the data is generated provides a consistent (often high-performing) service. As a result, many organizations invest in the infrastructure needed to deploy ML and deep learning (DL) solutions.

MACHINE LEARNING WORKLOAD AND GPGPU NUMA NODE LOCALITY

In the previous article “PCIe Device NUMA Node Locality” I covered the physical connection between the processor and the PCIe device briefly touched upon machine learning workloads with regards to PCIe NUMA locality. This article zooms in on why it is important to consider PCIe NUMA locality. General-Purpose Computing on Graphics Processing Units New compute-intensive workloads take advantage of the new programming model called general-purpose computing on GPU (GPGPU). With GPGPU, the many cores integrated on modern GPUs are used to offload a vast number of (parallel) compute threads from the CPU. By adding another computational device with different characteristics, a heterogeneous compute architecture is born. GPUs are optimized for streaming sequential (or easily predictable) access patterns, while CPUs are designed for general access patterns and concurrency of threads. Combined, they form a GPGPU pipeline, that is exceptionally well-suited to analyze data. The vSphere platform is well-suited to create GPGPU pipelines and optimizations are provided to VMs, such as DirectPath I/O Access (also known as Passthrough). Passthrough allows the application to interface with the accelerator device directly; however, data must be transferred from disk/network through the system (RAM) to the GPU. And controlling the data transfer is of interest to the overall performance of the platform for both GPGPU workload and non-GPGPU workload.

PCIE DEVICE NUMA NODE LOCALITY

During this Christmas break, I wanted to learn PowerCLI properly. As I’m researching the use-cases of new hardware types and workloads in the data center, I managed to produce a script to identify the PCIe Device to NUMA Node Locality within a VMware ESXi Host. The script set contains a script for the most popular PCIe Device types for data centers that can be assigned as a passthrough device. The current script set is available on Github and contains scripts for GPUs, NICs and (Intel) FPGAs.

VSPHERE 6.5+ DRS PAIRWISE BALANCING

Or maybe I should have called this blog post, “I’m seeing an excessive number of DRS initiated vMotions on my newly upgraded 6.5 environment”. Recently I was part of a few conversations about the nature of DRS load balancing in systems running vSphere 6.5 and newer. It was noticed that more vMotion operations where occurring since running 6.5 and it’s highly likely that these operations occur due to the new DRS pairwise balancing functionality. Pairwise balancing was introduced by vSphere 6.5 and is focused on keeping the host resource utilization disparity within a certain threshold. As a result, DRS performs load-balancing operations if the difference between the lowest-utilized host and the highest-utilized host is a certain percentage. That percentage depends on your migration threshold. The default migration threshold uses a 20% tolerable difference in utilization.

AMD EPYC NAPLES VS ROME AND VSPHERE CPU SCHEDULER UPDATES

Recently AMD announced the 2nd generation of the AMD EPYC CPU architecture, the EPYC 7002 series. Most refer to the new CPU architecture using its internal codename Rome. When AMD introduced the 1st generation EPYC (Naples), they succeeded in setting a new record of core count and memory capacity per socket. However, due to the CPU multi-chip-module (MCM) architecture, it is not an apples-to-apples comparison when compared to an Intel Xeon architecture. As each chip module contains a memory controller, each module presents a standalone NUMA domain. This impacts OS scheduling decisions and, thus, virtual machine sizing. A detailed look can be found here in English or here translated by Grigory Pryalukhin in Russian. Rome is different, the new CPU architecture is more aligned with the single NUMA per Socket paradigm, and this helps with obtaining workload performance consistency. There are some differences between Xeons and Rome. In addition, we made some adjustments to the CPU scheduler to deal with this new architecture. Let’s take a closer look at the difference between Naples and Rome.

60 MINUTES OF NUMA VMWORLD SESSION COMMANDS

Verify Distribution of Memory Modules with PowerCLI Get-CimInstance -CimSession $Session CIM_PhysicalMemory | select BankLabel, Description, @{n=‘Capacity in GB';e={$_.Capacity/1GB}} PowerCLI Script to Detect Node Interleaving Get-VMhost | select @{Name="Host Name";Expression={$_.Name}}, ​@{Name="CPU Sockets";Expression={$_.ExtensionData.Hardware.CpuInfo.NumCpuPackages}}, ​@{Name="NUMA Nodes";Expression={$_.ExtensionData.Hardware.NumaInfo.NumNodes}} Action-Affinity Monitoring Sched-Stats -t numa-migration Disable Action Affinity numa.LocalityWeightActionAffinity = 0 numa.PreferHT For more information on how to enable PreferHT: KB article 2003582 Host Setting: numa.PreferHT=1 VM Setting: numa.vcpu.PreferHT = TRUE

5 THINGS TO KNOW ABOUT PROJECT PACIFIC

During the keynote of the first day of VMworld 2019, Pat unveiled Project Pacific. In short, project Pacific transforms vSphere into a unified application platform. By deeply integrating Kubernetes into the vSphere platform, developers can deploy and operate their applications through a well-known control plane. Additionally, containers are now first-class citizens enjoying all the operations generally available to virtual machines. Although it might seem that the acquisition of Heptio and Pivotal kickstarted project Pacific, VMware has been working on project Pacific for nearly three years! Jared Rosoff, the initiator or the project and overall product manager, told me that over 200 engineers are involved as it affects almost every component of the vSphere platform.

VMWORLD US 2019 - KNOW BEFORE YOU GO PODCAST

Last week I had the pleasure of connecting again with my friends and colleagues Pete Flecha, Duncan Epping and amateur back up dancer to Pat Benatar, Mr. Ken Werneburg. During the podcast, we discussed the upcoming VMworld. As it is returning to San Francisco, it might be interesting to revisit your conference strategy. Although Moscone Center has been rebuilt and expanded, I believe we are still using all three buildings; North, South, and West (Located at Howard and 3rd). So take at least a jacket with you, SF Summers can be treacherous

ALLEN, MCKEOWN, AND KONDO

The title is a reference to one of the most interesting books I have ever read, Escher, Godel, and Bach. Someone described it as, “Read this book if you like to think about thinking, as well as to think about thinking about thinking”. The three books I want to share my thoughts on are in a sense feeding and shaping the behavior that allows you to clear your mind and focus more on the task at hand.