Welcome.
THE DYNAMIC WORLD OF LLM RUNTIME MEMORY
When meeting with customers and architectural teams, we often perform a specific exercise to separate a model’s static consumption (its weights) from its dynamic runtime consumption. In the unpredictable world of production AI, where concurrent users, complex system prompts, and varying RAG content create constant flux, it is easy to view memory as an elusive target. This article is designed to move your service level from probabilistic to deterministic concurrency. To make this accessible to those managing the hardware, I have intentionally used language common to system administrators rather than data scientists. Instead of focusing on the mathematical constructs of vectors and matrices, we will use the term representations to highlight the actual memory consumption of these data structures.
TALKING VCF 9 AND PRIVATE AI FOUNDATION ON THE UNEXPLORED TERRITORY PODCAST
Just before VMware Explore, I joined the Unexplored Territory Podcast to talk about the enhancements in VMware Cloud Foundation 9 and the Private AI Foundation with NVIDIA. We covered new functionality, such as Agent Builder, and walked through the broader enhancements for AI workloads. We also highlighted a few must-attend sessions at Explore. You can listen to the full episode here: Apple Podcasts Spotify During Explore, many people told me this episode was a great starting point to wrap their heads around VMware Private AI Foundation. If you’re looking for a concise way to catch up, this is a good place to begin.
WHICH MULTI-GPU CONFIGURATIONS ARE YOU PLANNING TO DEPLOY?
During VMware Explore, numerous conversations highlighted that most customers plan to deploy systems with two or more GPUs. The next challenge is deciding which type of multi-GPU configuration to adopt — a choice that depends on intra-node communication, inter-node interconnects, and cooling strategies. To better understand where organizations are heading, I’ve created a short survey. The diagram below illustrates the options available in the NVIDIA-certified systems portfolio, which I use as a reference point in the questions. Your feedback will help map out how different configurations are being considered and provide valuable input as we align our product strategy with customer needs. ** How to Read the Diagram**
VMWARE PRIVATE AI FOUNDATION - PRIVACY AND SECURITY BEST PRACTICES WHITE PAPER
I’m excited to announce the release of my latest white paper, “VMware Private AI Foundation - Privacy and Security Best Practices.” As many of you know, the world of artificial intelligence is rapidly evolving, and with that comes a new set of challenges, particularly around privacy and security. This white paper is not just about theory. It’s a practical guide introducing the foundational concepts, frameworks, and models underpinning private AI security. It’s a deep dive into the critical aspects of privacy and security in the context of AI, providing you with the tools to implement these principles in your own work. You’ll learn about the principle of shared responsibility, threat modeling for Gen-AI applications, and the CIA triad – confidentiality, integrity, and availability – as a guiding model for information security.
THE MISCONCEPTION OF SELF-LEARNING CAPABILITIES OF LARGE LANGUAGE MODELS DURING PRODUCTION
I enjoyed engaging with many customers about bringing Gen-AI to the on-prem data center at VMware Explore. Many customers want to keep their data and IP between the four walls of their organization, and rightly so. With VMware Private AI Foundation, we aim to utilize foundation models and build upon the great work of many smart data scientists. Foundation models like Llama 2, StarCoder, and Mistral 7b. Instead of building and training a large language model (LLM) from the ground up, which can be time-consuming and computationally expensive, organizations can leverage foundation models pre-trained on a massive dataset of text and code. If necessary, organizations can further fine-tune a foundation model on specific tasks and data in a short period of time.
GEN AI SESSIONS AT EXPLORE BARCELONA 2023
I’m looking forward to next week’s VMware Explore conference in Barcelona. It’s going to be a busy week. Hopefully, I will meet many old friends, make new friends, and talk about Gen AI all week. I’m presenting a few sessions, listed below, and meeting with customers to talk about the VMware Private AI foundation. If you are interested and you see me, walk by, come, and have a talk with me.
BASIC TERMINOLOGIES LARGE LANGUAGE MODELS
Many organizations are in the process of deploying large language models to apply to their use cases. Publically available Large Language Models (LLMs), such as ChatGPT, are trained on publicly available data through September 2021. However, they are unaware of proprietary private data. Such information is critical to the majority of enterprise processes. To help an LLM to become a useful tool in the enterprise space, an LLM is further trained of finetuned on proprietary data to adapt to organization-specific concepts.
MY SESSIONS AT VMWARE EXPLORE 2023 LAS VEGAS
Next week we are back in Las Vegas. Busy times ahead with meeting customers, old friends, making new friends, and presenting a few sessions. Next week I will present at Customer Technical Exchange (CTEX), {code}, and host two meet-the-expert sessions. I will also participate as a part-time judge at the {code} hackathon. Breakout Sessions 45 Minutes of NUMA (A CPU is not a CPU Anymore [CODEB2761LV] Tuesday, Aug 22, 2:45 PM - 3:30 PM PDT Level 4, Delfino 4003
VSPHERE ML ACCELERATOR SPECTRUM DEEP DIVE – INSTALLING THE NVAIE VGPU DRIVER
After setting up the Cloud License Service Instance, the NVIDIA AI Enterprise vGPU driver must be installed on the ESXi host. A single version driver amongst all the ESXi hosts in the cluster containing NVIDIA GPU devices is recommended. The most common error during the GPU install process is using the wrong driver. And it’s an easy mistake to make. In vGPU version 13 (the current NVAIE version is 15.2), NVIDIA split its ESXi host vGPU driver into two kinds. A standard vGPU driver component supports graphics, and an AI Enterprise (AIE) vGPU component supports compute. The Ampere generation devices, such as the A30 and A100 device, support compute only, so it requires an AIE vGPU component. There are AIE components available for all NVIDIA drivers since vGPU 13.
VSPHERE ML ACCELERATOR SPECTRUM DEEP DIVE – NVAIE CLOUD LICENSE SERVICE SETUP
Next in this series is installing the NVAIE GPU operator on a TKGs guest cluster. However, we must satisfy a few requirements before we can get to that step. NVIDIA NVAIE Licence activated Access to NVIDIA NGC NVIDIA Enterprise Catalog and Licensing Portal The license Server Instance activated NVIDIA vGPU Manager installed on ESXi Host with NVIDIA GPU Installed VM Class with GPU specification configured Ubuntu image available in the content library for TKGs Worker Node