MACHINE LEARNING ON VMWARE PLATFORM – PART 2
Resource Utilization Efficiency Machine learning, especially deep learning, is notorious for consuming large amounts of GPU resources during training. However, as the last part already highlighted, machine learning is more than just training a model. And these components within the machine learning workflow require large amounts of CPU, memory, storage, and network resources. Machine Learning on VMware Cloud Platform – Part 1 covered the three distinct phases: concept, training, and deployment. Existing “known data” is required to explore and train the model in both the concept and training phases. During the development of the model, it is common to use three different data sets: the training set, the validation set, and the testing set. Creating data sets is not only about getting as much data as possible. It is even more critical getting meaningful data and high levels of quality because the accuracy of the recommendation produced by the model is highly dependent on the quality of the dataset used for training and validation. The data science team needs to “wrangle” existing raw data into shape to get such a high-quality dataset. Data wrangling transforms the raw data into more valuable data that can be used as a dataset “downstream” to train a model. And all this wrangling requires a lot of collateral infrastructure and services besides just a bunch of GPUs.
MACHINE LEARNING ON VMWARE PLATFORM - PART 1
Machine Learning is reshaping modern business. Most VMware customers look at machine learning to increase revenue or decrease cost. When talking to customers, we mainly discuss the (vertical) training and inference stack details. The stack runs a machine learning model inside a container or a VM, preferably onto an accelerator device like a general-purpose GPU. And I think that is mostly due to our company DNA letting us relate machine learning workload directly to a hardware resource.
SOLVING VNUMA TOPOLOGY MISMATCH WHEN MIGRATING BETWEEN DUAL SOCKET SERVERS AND QUAD SOCKET SERVERS
I recently received a few questions from customers migrating between clusters with different CPU socket footprints. The challenge is not necessarily migrating live workloads between clusters because we have Enhanced vMotion Compatibility (EVC) to solve this problem. For VMware users just learning about this technology, EVC masks certain unique features of newer CPU generations and creates a generic baseline of CPU features throughout the cluster. If workloads move between two clusters, vMotion still checks whether the same CPU features are presented to the virtual machine. If you are planning to move workloads, ensure the EVC modes of the clusters are matching to get the smoothest experience.
STOP DESIGNING YOUR SERVER PLATFORM WITH SOLELY THE CPU ROADMAP IN MIND
Over the last 20 years, we designed our core data center platform following the CPU roadmap. But in today’s world, the devices attached to the processor make radical and revolutionary improvements, catering to the needs of the new workloads. I’m talking about devices like the GPU, the network adapter, and its natural offspring, the data processing unit (DPU). In the article “Project Monterey and the need for network cycles offload for ML workloads” I zoom into what’s in store for us data center architects in the upcoming years.
EXCITING SESSIONS FROM NVIDIA GTC FALL 2021
Over the last few weeks, I watched many sessions of the NVIDIA Fall version of GTC. I created a list of interesting sessions for a group of people internally at VMware, but I thought the list might interest some outside VMware. It’s primarily focused on understanding NVIDIA’s product and services suite and not necessarily deep diving into technology or geeking out on core counts and speeds and feeds. If you found exciting sessions that I haven’t listed, please leave them in the comments below.
VSPHERE 7 CORES PER SOCKET AND VIRTUAL NUMA
Regularly I meet with customers to discuss NUMA technology, and one of the topics that are always on the list is the Cores per Socket setting and its potential impact. In vSphere 6.5, we made some significant adjustments to the scheduler that allowed us to decouple the NUMA client creation from the Cores per Socket setting. Before 6.5, if the vAdmin configured the VM with a non-default Cores per Socket setting, the NUMA scheduler automatically aligned the NUMA client configured to that Cores per Socket settings. Regardless of whether this configuration was optimal for performing in its physical surroundings.
DRS THRESHOLD 1 DOES NOT INITIATE LOAD BALANCING VMOTIONS
vSphere 7.0 introduces DRS 2.0 and its new load balancing algorithm. In essence, the new DRS is completely focused on taking care of the needs of the VMs and does this at a more aggressive pace than the old DRS. As a result, DRS will resort to vMotioning a virtual machine faster than the previous DRS. And this is something that a lot of customers are noticing. In highly consolidated clusters, you might see a lot of vMotions occur. I perceive this as an infrastructural service. However, some customers might see this as a turbulent or nervous environment and rather see fewer vMotions. As a result, these customers like to dial down the DRS threshold, which is the right thing to do. But please be aware, if you still want DRS to have load-balancing functionality, do not slide the threshold all the way to the left.
PROJECT MONTEREY AND THE NEED FOR NETWORK CYCLES OFFLOAD FOR ML WORKLOADS.
VMworld has started, and that means a lot of new announcements. One of the most significant projects VMware is working on is project Monterey. Project Monterey allows the use of SmartNICS, also known as Data Processing Units, of various VMware partners within the vSphere platform. Today we use the CPU inside the ESXi host to run workloads and to process network operations. With the shift towards distributed applications, the CPUs inside the ESXi hosts need to spend more time processing network IO instead of application operations. This extra utilization impacts data center economics like consolidation ratios and availability calculations. On top of this shift from monolith application to distributed application is the advent of machine learning supported services in the enterprise data center.
MACHINE LEARNING INFRASTRUCTURE FROM A VSPHERE INFRASTRUCTURE PERSPECTIVE
For the last 18 months, I’ve been focusing on machine learning, especially how customers can successfully deploy machine learning infrastructure on a vSphere infrastructure. This space is exciting as it has so many great angles to explore. Besides the model training, a lot of stuff happens with the data. Data is transformed, data is moved. Data sets are often hundreds of gigabytes in size. Although that doesn’t sound that much compared to modern databases, these data sets are transformed and versioned. Where massive databases nest on an array, these data sets travel through pipelines that connect multiple systems with different architectures, from data lakes to in-memory key-value stores. As a data center architect, you need to think about the various components involved, where the compute horsepower is needed? How do you deal with an explosion of data? Where do you place particular storage platforms, what kind of bandwidth is needed, and do you always need the extreme low-latency systems in your ML infrastructure landscape?
CPU PINNING IS NOT AN EXCLUSIVE RIGHT TO A CPU CORE!
https://twitter.com/MrsBrookfield/status/1402955235497287685 Katarina tweeted a very expressive tweet about her love/hate (mostly hate) relation with CPU pinning, and lately I have been in conversations with customers contemplating whether they should use CPU pinning. The analogy that I typically use to describe CPU pinning is the story of the favorite parking space at your office parking lot. CPU pinning limits the compliant CPU “slots” for that vCPU to be scheduled on. So think about that CPU slot as the parking spot closest to the entrance of your office. You have decided that you only want to park in that spot. Every day of the year, that’s your spot and no other place else. The problem is, this is not a company-wide directive. Anyone can park in that spot, but you just limited yourself to that spot only. So it can happen that Bob arrives at the office first and lazy as he is, he will park to the office entrance as close as he can. Right in your spot. Now the problem with your self-imposed rule is that you cannot and will not park anywhere else. So when you show up (late to the party), you notice that Bob’s car is in YOUR parking spot, and the only thing you can do is to drive circles in some holding pattern until Bob leaves the office again. The stupidest thing. It’s Sunday, and you and Bob are the only ones doing some work. You’re out there on the parking lot, driving circles waiting until Bob leaves again, while Bob is inside in the empty building waiting on you to get started.