Architecting AI Infrastructure

The Architecting AI Infrastructure series examines how AI workloads operate across different systems and how this affects platform design choices. The articles break down the basics of placement, scheduling, and resource use, and the included tools show how these ideas play out in practice. The first tool in the series is the online vGPU Silo Capacity Calculator, which lets you see how your selected vGPU profiles affect available GPU capacity when you apply real placement limits.

Series overview
  1. Architecting AI Infrastructure Series - Part 1 In earlier articles, I looked at how modern AI models use GPU resources. I covered dynamic memory consumption, activation patterns, and how designs like mixture-of-experts …
  2. Architecting AI Infrastructure - Part 2 The previous article covered GPU placement as part of the platform’s lifecycle, not just a scheduling step. These choices affect what the platform can handle as workloads evolve. …
  3. Architecting AI Infrastructure - Part 3 In the first two articles, I looked at GPU consumption models and how AI workloads state their accelerator needs. In vSphere, these models take shape through virtual machine …
  4. Architecting AI Infrastructure - Part 4 In the last article, we tracked a GPU-backed VM from resource configuration to host selection. DRS evaluated the cluster, Assignable Hardware filtered hosts for GPU compatibility, …
  5. Architecting AI Infrastructure - Part 5 In the previous article, we looked at how GPUs are placed within an ESXi host and how GPU modes and assignment policies determine which physical GPU a workload uses. These …
  6. Architecting AI Infrastructure - Part 6 Last time, I looked at how Same Size vGPU mode works with different assignment policies and how right-sizing profiles can make placement more flexible. The main point was that both …