Frank Denneman - Chief Technologist AI at VMware

MACHINE LEARNING ON VMWARE PLATFORM – PART 2

June 8, 2022 by frankdenneman

Resource Utilization Efficiency

Machine learning, especially deep learning, is notorious for consuming large amounts of GPU resources during training. However, as the last part already highlighted, machine learning is more than just training a model. And these components within the machine learning workflow require large amounts of CPU, memory, storage, and network resources.

Machine Learning on VMware Cloud Platform – Part 1 covered the three distinct phases: concept, training, and deployment. Existing “known data” is required to explore and train the model in both the concept and training phases. During the development of the model, it is common to use three different data sets: the training set, the validation set, and the testing set. Creating data sets is not only about getting as much data as possible. It is even more critical getting meaningful data and high levels of quality because the accuracy of the recommendation produced by the model is highly dependent on the quality of the dataset used for training and validation. The data science team needs to “wrangle” existing raw data into shape to get such a high-quality dataset. Data wrangling transforms the raw data into more valuable data that can be used as a dataset “downstream” to train a model. And all this wrangling requires a lot of collateral infrastructure and services besides just a bunch of GPUs.

Massive amounts of datasets are not new to any enterprise IT organization. “System of records” has always been the backbone of business processes. Think back about mainframes in the ’70s and the rise of on-premises ERP systems like SAP and Oracle in the ’80s and ’90s. And today, databases, data warehouses, and data lakes contain petabytes of data. A 100-gigabyte dataset used for training purposes might sound unimpressive. But there is a difference. This dataset is constantly on the move. Bits and bytes do not slowly accrue in a database and sit there waiting to be cherry-picked by a SQL statement. No, these datasets are pulled and pushed through the infrastructure. It will be extracted from multiple sources, transformed by different platforms, and stored and versioned many times over. For example, many ML projects capture and store massive amounts of unstructured data (video, audio, log files) in its native format in systems that need to be structured, transformed, and analyzed. These processes generate a lot of data movement, which, in virtualized platforms, burns (a lot of) x86 cycles and exposes data on the network.

Below is a diagram highlighting some of the structural components of such an environment, omitting a lot of essential data science teams tools such as the various collaborative Juypter notebook solutions, artifacts stores, or complete data science or MLops platforms such as Databricks, Domino, H20.ai, DataRobot, or Dataiku.

If we look at the processes after training, they belong to the deployment phase. In this phase, the data science team, or the MLops team, takes a converged model and integrates it into a system or platform that engages with the customer or an end system, like a robot arm or factory installation. A converged model is a model that is trained up to a state where additional training will not improve the model. Why not say, finished model? As the world changes, the model might not be trained to reflect the current state of the world. Think about what happened during Covid and the ML models deployed in the hotel, airliner, and entertainment industry. They needed some readjustments to reflect the current situation in the world. And because of this, some models need retraining. This retraining does not happen in real-time. It depends on the use case and the data flow. Think about self-scanning registers, holiday season packing changes for supermarkets, or additional items in a warehouse. In general, most teams start with manual retraining. Still, they want to move to automation and look at CI/CD pipelines, create a cyclical deployment process, retrain new models, and capture un-seen data as new training sets.

As you can imagine, running such a platform and its (pipeline) components requires a lot of processing power. On top of that, in many organizations, AI\ML projects are initiated by individual business units attempting to solve their pressing business challenge. Each team runs its solution tech stack and develops its models along its development lifecycle with its resource utilization. What we see today is the start of ML-projects sprawl. Many ML projects start small, with small data sets, starting on laptops, some small datasets, and slowly grow and become essential to the business. And we attach the same organization and process models to this phenomenon. Someone will detect this, start to ask for more consolidation, start a Machine Learning Center of Excellence and start to think of centralizing resource utilization.

And this is the right way of thinking. You do not want each team to erect its own isolated platform. We’ve seen this happen in the ’90s when organizations moved away from mainframes to decentralized X86 servers. A lot of these machines were underutilized. We see now that a lot of data scientists are simply not aware of the power of virtualization. It’s cloud or bare-metal. Cloud platforms are great to start but are too opinionated, and thus they turn to bare-metal. They leave out the greatest thing since sliced bread (now, I might be a bit biased here).

Let’s use two examples from the concept and training phase. When the data scientist team performs a “Hyperparameter search,” they use multiple smaller GPU-equipped machines and smaller data sets to find the correct ML model architecture and model. Or, when they are using their Spark cluster for intensive data processing tasks, they typically only concentrate on the pre-processing task. The key here is that an ML platform mainly consists of many parts, but only one part is heavily utilized. From a resource utilization perspective, we have to deal with load-correlated utilization spikes per model development since many platform parts are originally designed as a distributed architecture.

I can imagine that all of the above can sound extremely complex, but in many cases, it isn’t. If organizations centralize their efforts onto a single platform, we have to deal with this workload’s noisy-neighbor aspect. But, we all have dealt with noisy neighbors before, and the best part is that a lot of the core parts of vSphere are updated to handle new distributed workloads. For example, DRS 2.0. Runs every 60 seconds instead of every 5 minutes to handle containerized workloads and focuses on workload happiness instead of dealing at a high level of balancing host utilization across a vSphere cluster. The partnership with NVIDIA that brought us NVIDIA AI Enterprise allows us to spatially partition the GPU to isolate compute resources and allow full multi-tenancy.

See the recent blogpost of Lan Vu about how the different vGPU technology can best suit the ML use case )

But we are also working on newer technologies with our partners to think about the heavy IO streams that ML model development will introduce to the vSphere platform. Project Monterey introduces the Data Processing Unit (DPU) into our cluster architecture.

There are many use-cases, but the one that excites me is the sheer amount of innovative network IO offload you can generate with DPUs and potentially more intelligent things with NVIDIAs Bluefield architecture. Nowadays, every network IO within a virtualization platform consumes an X86 of the ESXi host—more than one X86 cycle. And so, pulling and pushing datasets and datasets for many different models through the platform will impact the X86 cycles left over to run the rest of the virtual machines and containers used for the toolchain for the data scientists and other employees and services consuming the virtualization platform. By introducing DPUs, you isolate these network IO streams from the compute layer.

Another project with immense potential for the ML platform is project Capitola, or as the vSphere team calls it, the Software-Defined Memory structure. It would be best to have copious amounts of storage space during the pre-processing data phase, but you also want it fast. But you might not want to break the bank and spend your entire annual IT budget on RAM modules. Project Capitola allows you to use different memory technologies and offer them different tiers of memory capacity to your workload without rewriting your applications.

The critical challenge is to have a platform that can provide the right resources and attach and detach the resources efficiently and economically so that the data science team and the organization benefit from this. The platform needs to provide the workload environment as quickly as possible (self-service) and be resilient. I’ll dive into risk mitigation in the next part.

If you want to learn more about project Monterey, Capitola, or the self-service part, please join Cormac Hogan, Duncan Epping, and me at our (Virtual) Roadshow “The ever-evolving VMware Infrastructure” . Ask your VMUG leader about the possibilities.

Machine Learning on VMware Platform – Part 1

May 25, 2022 by frankdenneman

Machine Learning is reshaping modern business. Most VMware customers look at machine learning to increase revenue or decrease cost. When talking to customers, we mainly discuss the (vertical) training and inference stack details. The stack runs a machine learning model inside a container or a VM, preferably onto an accelerator device like a general-purpose GPU. And I think that is mostly due to our company DNA letting us relate machine learning workload directly to a hardware resource.

But Machine Learning (ML) allows us to think much broader than that. One of the most cited machine learning papers, “Machine Learning: The High-Interest Credit Card of Technical Debt,” shows us that running ML code is just a tiny part of the extensive and complex hardware and software engineering system.

Machine Learning Infrastructure Platform

As of today, VMware’s platform can offer much functionality to ML practitioners already. We can break down a machine learning platform into horizontal fabrics connecting the vertical training and inference stack. An example of a horizontal fabric can be a data pipeline that runs from ingesting data all the way to feeding the training stack a training data set. Or it can be a pipeline that streams data from a sensor to an ISV pre-trained model and goes all the way to control the actions of a robot arm. All the functions, applications, and services are backed by containers and VMs that could run on a VMware platform.

Looking beyond the vertical training stack, we immediately see familiar constructs we have spent years caring for as platform architects. The majority of databases run on vSphere platforms. Many of these contain the data; data scientists want to use to train their models. Your vSphere platform can efficiently and economically run all the services needed to extract and transform the data. The storage services safely and securely store the data close to the computational power. The networking and security services, combined with the core services, can offer intrinsic security to the data stream from the point of data ingestion all the way to the edge location where the model is served.

This service allows each data science team to pull software to their development environment whenever they need it. Using self-service marketplace services, such as “VMware Application Catalog” (formerly known as Bitnami ), allows IT organizations to work together with the head of data science to curate their ML infrastructure toolchains.

Improving ROI of low utilized resources

If we focus on the vertical training and inference stack first, we can apply the same story (I at least) discussed for a long time. Instead of talking about X86 cycles and memory utilization, we now focus on accelerator resources. Twenty years ago, we revolutionized the world by detaching the workload from the metal by introducing the virtual machine. The compelling story was that the server hardware was vastly underutilized, typically around 4%. This convinced customers to start virtualizing. By consolidating workload and using different peak times of these workloads, the customer could enjoy a better return on investment of the metal. This story applies to today’s accelerators. During GTC fall, it was quoted that today’s utilization of accelerators is below 15%.

Accelerators are expensive, and with the current supply chain problems, very challenging to obtain these resources. Let alone scale-out.

The availability and distribution of accelerator resources within the organizations might become a focal point once the organization realizes that there are multiple active projects and the current approach of dedicated resources is not economically sustainable. Or the IT organization is looking to provide a machine learning platform service. The pooling of accelerator resources is beneficial to the IT organization from an economic perspective, plus it has a positive effect on the data science teams. The key to convincing the data science teams is understanding the functional requirements of the phases of the model development lifecycle and deploying an infrastructure that can facilitate those needs.

A machine learning model follows a particular development lifecycle, Concept – Training – Deployment (inference).

Concept phase: The data science team determines what framework and algorithm to use during the concept phase. During this stage, the DS team explores what data is available, where the data lives and how they can access it. They study the idea’s feasibility by running some tests using small data sets. As a result, the team will run and test some code, with lots of idle time in between. The run time will be seconds to minutes when the code is tested. As you can imagine, a collection of bare metal machines assigned to individual data scientists or teams with dedicated expensive GPUs might be overkill for this scenario.

In many cases, the CPU provides enough power to run these tests. Still, if the data science team wants to research the effect and behavior of the combination of the model and the GPU architecture, virtualization can be beneficial. This moment is where pooling and abstraction, two core tenets of VMware’s DNA, come into play. We can consolidate the efforts of different data science teams in a centralized environment and offer fractional GPUs (NVIDIA vGPU) or remoting with VMware Bitfusion (intercept and remoting of CUDA API). Or use multi-instance GPU (MIG) functionality that allows for stricter resource isolation and thus predictable performance. These functionalities allow the organization to efficiently and economically provide an infrastructure for the data science teams to run their workload when they are in the testing phase of developing a particular model.

Training phase: A virtualized platform is also advantageous for the training phase of the model. During the training phase of the model development lifecycle, the data science team pushes workload for longer times, which is throughput-oriented. Typically they opt for larger pools of parallel processing power. In this phase, CPU resources aren’t cutting anymore, and most of the time, a part of a single physical GPU isn’t enough either. Depending on their performance needs, the customer can use remoting technology (Bitfusion) to pools of GPU cards or use PCI passthrough device technology for running the workload directly on one or more physical devices.

But the hungry nature of the workload still should not warrant a dedicated infrastructure as training jobs are transient in nature. Sometimes a training job is 20 hours, and sometimes 200 hours. The key point is that idle time of GPU resources occurs after the training job is finished. The data science team reviews the results of the training job and discusses which hyperparameter to tune. This situation is very inefficient. The costly accelerators are idling while other data science teams struggle for resources. Virtualization platforms can democratize the resources and help scale-out resources for the data science teams. Convince data science teams to “give up” their dedicated infrastructure is based on helping them understand the nature of virtualization and how pooling and abstracting can benefit their needs.

Instead of having isolated pools of GPUs scattered throughout the organizations, virtualization allows data science teams to have a centralized pool of resources at their disposal. When they need to run a training job, they can allocate far more (spare and unused) resources from that pool than the number of resources they could have allocated in their dedicated workstation. In return for giving up their dedicated resources, they have a larger pool of resources at their disposal, reducing the duration of training time.

Deployment phase: The deployment phase is typically performance-focused in nature. This is where we meet the age-old discussion of hypervisor overhead. The hypervisor introduces some overhead, but the performance teams are working on this to get this gap reduced. We are now at the single-digit mark (+-6%), and in some cases near bare-metal performance, and thus other benefits should be highlighted. Such as the ability to deal with heterogeneous architecture due to using virtual machines for workloads or running Kubernetes in VMs. The VMware ecosystem allows for deploying and managing at scale and focuses on delivering lifecycle management at scale. Portability and mobility of VMware’s workload constructs, all these unique capabilities matter to mature organizations and offset a single-digit performance loss.

The Data Science Team

In general data scientist team ultimately focuses on delivering machine learning models, their MBOs state to deliver these models as fast as possible, with the highest level of accuracy. Although light coding (Python, R, Scala) is a part of their daily activities, most data scientists do not have a software engineering background and thus can be seen as a step above developers. As a result, our services should not treat them as developers but align more with the low-code, no-code trend seen in the AI world today. Data Scientists do not and should not have a deep understanding of infrastructure architectures. Their lowest level of abstraction is predominantly a Docker container. Kubernetes is, in most cases, a “vendor” requirement. Any infrastructure layer underneath the container should be of no concern to the data scientist, and our platform should align with that thought.

The data scientist should not learn how to optimally configure their data and machine learning platform to operate on the VMware platform. The VMware Cloud platform should abstract these decisions away from the data scientist and provide (automatic) constructs for ML Ops teams, IT architects, and central operation teams to run these applications and services optimally.

Data scientists are primarily unaware of IT policies. They have machines and software often not included in the corporate IT service catalog. This situation can be a massive security liability as they consistently work with the crown jewels of the company, highly sensitive company data. Data is now exposed, usually not well secured, a prime target for ransomware or other malicious practices. One of the examples that our field teams repeatedly see is that this data is often stored on very expensive shiny Alienware laptops. This data needs to be protected and should not be stored on high-risk target devices.

These devices are bought as the current IT platforms lack the hardware accelerators or any collaborative containerized service platform. Embezzlement of these devices leaks out company data and halts the model development progress. Besides both of these problems, an organization does not want to have its highly paid data science team twiddling its thumbs. Having a platform ready to cater to the needs of the data science teams, both in software and hardware requirements, eliminates these threads and helps the organizations benefit from the many benefits of a centralized platform.

The current method of providing a data scientist team with highly specialized, portable equipment is an extreme liability from a security perspective. Many data science teams don’t see the centralized IT team as an enabler due to the lack of service alignment. Providing a centralized platform with the right services and equipment can help the organization severely reduce the security risk while improving the data science teams’ capability to deliver a model by leveraging all the advantages of a centralized platform.

Centralizing ML Infrastructure Platforms

If we take it one step further, you can obtain a higher level of economics by running the ML infrastructure platforms on the shared infrastructure instead of dedicated physical machines. As described in the previous paragraph, data science teams operate with full autonomy in many organizations and have dedicated resources at their disposal. Individual business groups within the organizations are forming their data science teams, and due to this decentralized effort, a lot of scattered high-powered machines around the organization create pools of fragmented compute and storage resources. By centralizing the different ML infrastructure services workload of different teams on one platform, you benefit from the many advantages the VMware ecosystem provides:

Resource Utilization Efficiency
Risk Mitigation
Workload Construct Standardization and Life Cycle Management
Data adjacency
Intrinsic security for data streams
Open platform

But I’ll leave these topics for part 2 of this series

Solving vNUMA Topology Mismatch When Migrating between Dual Socket Servers and Quad Socket Servers

March 11, 2022 by frankdenneman

I recently received a few questions from customers migrating between clusters with different CPU socket footprints. The challenge is not necessarily migrating live workloads between clusters because we have Enhanced vMotion Compatibility (EVC) to solve this problem.

For VMware users just learning about this technology, EVC masks certain unique features of newer CPU generations and creates a generic baseline of CPU features throughout the cluster. If workloads move between two clusters, vMotion still checks whether the same CPU features are presented to the virtual machine. If you are planning to move workloads, ensure the EVC modes of the clusters are matching to get the smoothest experience.

The challenge when moving live workloads between ESXi hosts with different socket configurations is that vNUMA topology of the virtual machine does not match the physical topology. A virtual NUMA topology exists out of two components, the component that presents the CPU topology to the virtual machine, called the VPD. The VPD exists to help the guest OS and the applications optimize their CPU scheduling decisions. This VPD construct is principally the virtual NUMA topology. The other component, the PPD, groups the vCPUs and helps the NUMA scheduler for placement decisions across the physical NUMA nodes.

The fascinating part of this story is that the VPD and PPD are closely linked, yet they can differ if needed. The scheduler attempts to mirror the configuration between the two elements; the PPD configuration is dynamic, but the VPD configuration always remains the same. From the moment the VM is powered on, the VPD configuration does not change. And that is a good thing because operating systems generally do not like to see whole CPU layouts change. Adding a core with CPU hot add is all right. But drastically rearranging caches and socket configurations it’s pretty much a bridge too far.

As mentioned before, the VPD remains the same. Still, the NUMA scheduler can reconfigure the PPD to optimize the vCPU grouping for the CPU scheduler. When will this happen? When you move a VM to a host with a different physical CPU configuration, i.e. Socket Count, or physical cores per socket count. This way, ESXi still squeezes out the best performance it can in this situation. The drawback of this situation is the mismatch between presentation and scheduling.

This functionality is great as it allows workloads to enjoy mobility between different CPU topologies without any downtime. However, we might want to squeeze out all the performance possible. Some vCPUs might not share the same cache, although the application thinks they do. Or, some vCPU might not even be scheduled together in the same physical NUMA node, experiencing latency and bandwidth reduction. To be more precise, this mismatch can impact memory locality and the action-affinity load-balancing operations of the scheduler. Thus, it can impact the VM performance and create more inter CPU traffic. This impact might be minor on a per-VM basis, but you have to think in scale, the combined performance loss of all the VMs, so for larger environments, it might be worthwhile to get it fixed.

I’ve created a 36 vCPU VM on a dual-socket system with twenty physical CPU cores per socket. The power-on process of the virtual machine creates the vNUMA topology and enters all kinds of entries in the VMX file. Once the VM powers on, the VMX file receives the following entries.

numa.autosize.cookie = "360022"

numa.autosize.vcpu.maxPerVirtualNode = "18"

The key entry for this example is the “numa.autosize.vcpu.maxPerVirtualNode = “18”, as the NUMA scheduler likes to distribute as many vCPUs across many cores as possible and evenly across sockets.

But what happens if this virtual machine moves to a quad-socket system with 14 physical cores per socket? The NUMA scheduler will create three scheduling constructs to distribute those vCPUs across the NUMA nodes but keep the presentation layer the same not to confuse the guest OS and the applications.

Since the NUMA topologies are created during a VM’s power-on, we have to shut down the virtual machine and power it back to realign the VPD and PPD topology again. Well, since 2019, we don’t need to power down the VM anymore! And I have to admit. I only found out about it just recently. Bob Plankers (not this Bob) writes about the vmx.reboot.PowerCycle advanced parameter here. This setting does not require a complete power cycle anymore.

That means that if you are in the process of migrating your VM estate from dual-socket systems to quad-socket systems, you can add the following adjustments in the VMX file while the VM is running. (for example via PowerCLI / New-AdvancedSetting)

vmx.reboot.PowerCycle = true

numa.autosize.once = false

The setting vmx.reboot.PowerCycle will remove itself from the VMX file, but it’s best to remove the numa.autosize.once = false from the VMX file. So you might want to track this. Same as adding the setting, you can remove the setting while the VM is up and running.

When you have applied these settings to the VMX, the next time the VM reboots, the vNUMA topology will be changed. As always, keep in mind that older systems might react more dramatically than newer systems. After all, you are changing the hardware topology of the system. It might upset an older windows system or optimizations of an older application. Some older operating systems do not like this and will need to do reconfiguration themselves or need some help from the IT ops team. In the worst-case scenario, it will treat the customer to a BSOD with that in mind. It’s recommended to work with the customer with old OSes and figure out a test and migration plan.

Special thanks to Gilles Le Ridou for helping me confirm my suspicion and helping me test scenarios on his environment. #vCommunity!

Stop designing your server platform with solely the CPU roadmap in mind

December 20, 2021 by frankdenneman

Over the last 20 years, we designed our core data center platform following the CPU roadmap. But in today’s world, the devices attached to the processor make radical and revolutionary improvements, catering to the needs of the new workloads. I’m talking about devices like the GPU, the network adapter, and its natural offspring, the data processing unit (DPU). In the article “Project Monterey and the need for network cycles offload for ML workloads” I zoom into what’s in store for us data center architects in the upcoming years.

To service the request of these new workloads, we need to move away from designing a platform solely based on a CPU roadmap and plug these devices in a server as an afterthought. When designing a platform for these new workloads, we have to start holistically designing data center systems.

Together with Luke Wignall from NVIDIA, we (Duncan Epping, Johan van Amersfoort, and I) discuss DPU technology and other efforts to run and manage modern workloads in episode 5 of the Unexplored Territory podcast.

Apple: apple.co/3lYZGCF

Google: bit.ly/3oQVarH

Spotify: spoti.fi/3INgN3R

Or anywhere else where you get your podcasts!

Exciting Sessions from NVIDIA GTC Fall 2021

December 9, 2021 by frankdenneman

Over the last few weeks, I watched many sessions of the NVIDIA Fall version of GTC. I created a list of interesting sessions for a group of people internally at VMware, but I thought the list might interest some outside VMware. It’s primarily focused on understanding NVIDIA’s product and services suite and not necessarily deep diving into technology or geeking out on core counts and speeds and feeds. If you found exciting sessions that I haven’t listed, please leave them in the comments below.

Data Science

Accelerating Data Science: State of RAPIDS [A31490]

Reason to watch: A 55-minute overview of the state of RAPIDS (the OS framework for data science), upcoming features, and improvements

Inference

Please note: Triton is part of the NVIDIA AI Enterprise stack (NVAIE)

How Hugging Face Delivers 1 Millisecond Inference Latency for Transformers in Infinity [A31172]

Reason to watch: A 50-minute session. Hugging Face is the dominant play for NLP Transformers. Hugging Face is pushing the open platform \ democratizing ML message.

Scalable, Accelerated Hardware-agnostic ML Inference with NVIDIA Triton and Arm NN [A31177]

Reason to watch: A 50-minute session covering ARM NN architecture and deploying models on far edge technology (Jetson\Pi’s) using NVIDIA Triton Inference Server

Deploy AI Models at Scale Using the Triton Inference Server and ONNX Runtime and Maximize Performance with TensorRT [A31405]

Reason to watch: 50 minutes overview of Triton architecture, features, customer case studies, and onyx runtime integration. It covers ONNX RT, which provides optimization for target platforms (inference).

NVIDIA Triton Inference Server on AWS: Customer success stories and AWS deployment methods to optimize inference throughput, reduce latency, and lower GPU or CPU inference costs. [SE31488]

Reason to watch: 45 minutes session covering Triton on AWS SageMaker and two customers sharing their deployment overview and their lessons learned.

No-Code Approach

NVIDIA TAO: Create Custom Production-Ready AI Models without AI Expertise [D31030]

Reason to watch: A 3-minute overview of TAO, an model adaptation framework that can fine-tune pre-trained models by feeding proprietary (smaller) datasets and optimizing it for the inference hardware architecture.

AI Life-cycle Management for the Intelligent Edge [A31160]

Reason to watch: A 50-minute session that covers NVIDIAs approach of Transfer Learning. NVIDIA provides a Pre-trained model, while customers optimize the model, NVIDIA TAO assists with future optimization for inference deployment. Fleet command to orchestrate the deployment of the model at the edge.

A Zero-code Approach to Creating Production-ready AI Models [A31176]

Reason to watch: A 35-minute session that explores TAO in more detail and provides a demo of the TAO (GUI-based) solution.

NVIDIA LaunchPad

Simplifying Enterprise AI from Develop to Deploy with NVIDIA LaunchPad [A31455]
Reason to watch: A 30-minute overview of the NVIDIA Cloud AI platform delivered through their partnership with Equinix. (Rapid testing and prototyping AI)

How to Quickly Pilot and Scale Smart Infrastructure Solutions with LaunchPad and Metropolis [A31622]

Reason to watch: A 30-minute overview of how to use Metropolis (Computer Vision AI Application Framework) and LaunchPad to accelerate POCs. (Zero Touch Testing and System Sizing). The session covers Metropolis Certification for system design (TS: 11:30) and the bare metal access.

NVIDIA and Cloud Integration

From NGC to MLOps with NVIDIA GPUs and Vertex AI Workbench on Google Cloud (Presented by Google Cloud) [A31680]

Reason to watch: A 40-minutes overview of NCG integration with Google Vertex AI (Google End-to-End ML AI Platform)

Automate Your Operations with Edge AI (Presented by Microsoft Azure) [A31707]

Reason to watch: A 15-minutes overview of Azure Percept. Azure Percept is an edge AI solution for IoT devices, now available on the Azure HCI stack.

Fast Provisioning of Kubernetes Clusters for the AI/ML Developer on VMware: Technical Details (Presented by VMware) [A31659]

Reason to watch: A 45-minute technical overview of the NVIDIA and VMware partnership demonstrating the key elements of the NVIDIA AI-Ready Enterprise Platform in detail.

NVIDIA EGX

Exploring Cloud-native Edge AI [A31166]

Reason to watch: A 50-minute overview of NVIDIA’s cloud-native platform (Kubernetes based, edge AI platform)

Retail

The One Retail AI Use Case That Stands Out Above the Rest [A31548]

Reason to watch: A 55-minute session providing great insights into real use-case of Everseen technology deployed at Kroger

One of the World’s Top Retailers is Bringing AI-Powered Convenience to a Store Near You [A31359]

Reason to watch: A 25-minute session providing insights into the challenges of deploying and using autonomous store technology.

DPU

Real-time AI Processing at the Edge [A31164]

Reason to watch: A 40-minute session covering the GPU+DPU converged accelerators (A40X and A100X), their architecture and their DOCA (Data Center on a Chip Architecture) and CUDA architecture and programming environment.

Programming the Data Center of the Future Today with the New NVIDIA DOCA Release [A31069]

Reason to watch: A 40-minute session covering DOCA architecture in detail.

NVIDIA AI Enterprise with VMware vSphere: Combining NVIDIA GPU’s Superior Performance, NVIDIA AI Software, and Virtualization Benefits for AI Workflows (Presented by VMware, Inc.) [A31694]

Reason to watch: A 50-minute session offering an excellent explanation of the different vGPU modes (native vs MIG mode). Starting from the 37 minute time stamp the presenters dive into the use of Network Function virtualization on smartNICs.