VSPHERE 8 AND VSAN 8 UNEXPLORED TERRITORY PODCAST DOUBLE HEADER
This week we released two episodes covering the vSphere 8 and vSan 8 releases. Together with Feidhlim O’Leary, we discover all the new functions and features of the vSphere 8 platform. You can listen to this episode on Spotify, Apple, or on our website: unexploredterritory.tech Pete Koehler repeats his stellar performance of last time and helps us understand the completely new architecture of vSAN 8. You can listen to this episode on Spotify, Apple, or on our website: unexploredterritory.tech or anywhere else you get your podcasts!
UNEXPLORED TERRITORY - VMWARE EXPLORE USA SPECIAL
This week Duncan and I attended VMware Explore to co-present the session “60 Minutes of Virtually Speaking Live: Accelerating Cloud Transformation.” with William Lam and our buddies of the Virtually Speaking Podcast, Pete Flecha and John Nicholson. The recordings should be made available soon or sign up for the session in Barcelona. During the week, we caught up with many people and captured soundbites of people such as Kit Colbert, Chris Wolf, Stephen Foskett, Sazalla Reddy, and a few more. You can listen to this special VMware Explore episode on Spotify (spoti.fi/3cITI7p), Apple (apple.co/3q35dJJ) or our website: unexploredterritory.tech/episodes/
NEW VSPHERE 8 FEATURES FOR CONSISTENT ML WORKLOAD PERFORMANCE
vSphere 8 is full of enhancements. Go to blogs.vmware.com or yellow-bricks.com for more extensive overviews of the vSphere 8 release. In this article, I want to highlight two features of the new vSphere 8 version that will help machine learning (ML) workloads perform consistently and possibly faster than manually configured workload constructs. The two features which make this possible are UI enhancements for the vNUMA Topology and the Device Groups. Hardware 20 Scalability Enhancements Before we dive into the features, vSphere 8 introduces a new virtual hardware feature that allows us to introduce new wonderful things and push the boundaries again of the platform. With vSphere 8, the virtual hardware level advances to version 20, and it offers new capabilities for ML accelerators. The support for DirectPath I/O devices went up from 16 to 32. We also worked with NVIDIA to increase the support of the vGPU devices, and now, with vSphere 8, each ESXi host can support up to 8 vGPU devices.
TRAINING VS INFERENCE - NETWORK COMPRESSION
This training versus inference workload series provides platform architects and owners insights about ML workload characteristics. Instead of treating deep neural networks as black box workloads, ML architectures and techniques are covered with infrastructure experts in mind. A better comprehension of the workload opens up the dialog between infrastructure and data science teams, hopefully resulting in better matching workload requirements and platform capabilities. Part 3 of the series focussed on the memory consumption of deep learning neural network architectures. It introduced the different types of operands (weights, activations, gradients) and how each consumes memory and requires computational power throughout the different stages and layers of the neural network. Part 4 showed that a floating point data type impacts a neural network’s memory consumption and computational power requirement. I want to cover neural network compression in this part of the training versus inference workload deep dive. The goal of neural network compression is inference optimization to either help fit and run a model at a constrained endpoint or to reduce inference infrastructure running costs.
HOW TO WRITE A BOOK - SHOW UP DAILY
During the Belgium VMUG, I talked with Jeffrey Kusters and the VMUG leadership team about the challenges of writing a book. Interestingly enough, since that VMUG, the question of how to start writing a book kept appearing in my inbox, dm, and Linkedin Messaging regularly. This morning Michael Rebmann’s question convinced me that it’s book writing season again, so maybe it’s better to put my response in a central place. https://twitter.com/_michaelrebmann/status/1553498293736538116
TRAINING VS INFERENCE - NUMERICAL PRECISION
Part 4 focused on the memory consumption of a CNN and revealed that neural networks require parameter data (weights) and input data (activations) to generate the computations. Most machine learning is linear algebra at its core; therefore, training and inference rely heavily on the arithmetic capabilities of the platform. By default, neural network architectures use the single-precision floating-point data type for numerical representation. However, modern CPUs and GPUs support various floating-point data types, which can significantly impact memory consumption or arithmetic bandwidth requirements, leading to a smaller footprint for inference (production placement) and reduced training time.
TRAINING VS INFERENCE - MEMORY CONSUMPTION BY NEURAL NETWORKS
This article dives deeper into the memory consumption of deep learning neural network architectures. What exactly happens when an input is presented to a neural network, and why do data scientists mainly struggle with out-of-memory errors? Besides Natural Language Processing (NLP), computer vision is one of the most popular applications of deep learning networks. Most of us use a form of computer vision daily. For example, we use it to unlock our phones using facial recognition or exit parking structures smoothly using license plate recognition. It’s used to assist with your medical diagnosis. Or, to end this paragraph with a happy note, find all the pictures of your dog on your phone.
UNEXPLORED TERRITORY PODCAST EPISODE 19 - DISCUSSING NUMA AND CORES PER SOCKETS WITH THE MAIN CPU ENGINEER OF VSPHERE
Richard Lu joined us to talk basics of NUMA, Cores per Socket, why modern windows and mac systems have a default 2 cores per socket setting, how cores per socket help the guest OS interpret the cache topology better, the impact of incorrectly configured NUMA and Cores per Socket systems and many other interesting CPU related topics. Enjoy another deep dive episode, you can listen to and download the episode on the following platforms:
MACHINE LEARNING ON VMWARE PLATFORM – PART 3 - TRAINING VERSUS INFERENCE
Machine Learning on VMware Cloud Platform – Part 1 covered the three distinct phases: concept, training, and deployment, part 2 explored the data streams, the infrastructure components needed and vSphere can help with increasing resource utilization efficiency of ML platforms. In this part, I want to go a little bit deeper into the territory of training and inference workloads. It would be best to consider the platform’s purpose when building an ML infrastructure. Are you building it for serving inference workloads, or are you building a training platform? Are there data science teams inside the organization that create and train the models themselves? Or will pre-trained models be acquired? Where will the trained (converged) model be deployed? Will it be in the data center, industrial sites, or retail locations?
UNEXPLORED TERRITORY PODCAST EPISODE 18 - NOT JUST ARTIFICIALLY INTELLIGENT FEATURING MAZHAR MEMON
In this week’s Unexplored Territory Podcast, we have Mazhar Memon as our guest. Mazhar is one of the founders of VMware Bitfusion and the principal inventor of Project Radium. In this episode, we talk to him about the start of Bitfusion, what challenges Project Radium solves, and what role the CPU has in an ML world. If you like deep-dive podcast episodes, grab a nice cup of coffee or any other beverage of your liking, open your favorite podcast app, strap in and press play.