Architecting AI Infrastructure

The Architecting AI Infrastructure series examines how AI workloads operate across different systems and how this affects platform design choices. The articles break down the basics of placement, scheduling, and resource use, and the included tools show how these ideas play out in practice. Tools in this series include the vGPU Silo Capacity Calculator (how profile catalogs influence long-term deployable capacity under placement limits) and Same-size vs Mixed-size Placement (how Same-size and Mixed-size modes behave as placement decisions accumulate over time).

Series overview

9 parts • Latest: Why Multi GPU Requires Topology Awareness

Part 1 Why GPU Placement Becomes the Defining Problem
Architecting AI Infrastructure Series - Part 1 In earlier articles, I looked at how modern AI models use GPU resources. I covered dynamic memory consumption, activation patterns, and how designs like mixture-of-experts …
Part 2 GPU Consumption Models as the First Architectural Choice in Production AI
Architecting AI Infrastructure - Part 2 The previous article covered GPU placement as part of the platform’s lifecycle, not just a scheduling step. These choices affect what the platform can handle as workloads evolve. …
Part 3 How vSphere DRS Makes GPU Placement Decisions
Architecting AI Infrastructure - Part 3. In the first two articles, I looked at GPU consumption models and how AI workloads state their accelerator needs. In vSphere, these models take shape through virtual machine …
Part 4 How vSphere GPU Modes and Assignment Policies Determine Host Level Placement
Architecting AI Infrastructure - Part 4 In the last article, we tracked a GPU-backed VM from resource configuration to host selection. DRS evaluated the cluster, Assignable Hardware filtered hosts for GPU compatibility, …
Part 5 How Same Size vGPU Mode and Right-sizing Shape GPU Placement Efficiency
Architecting AI Infrastructure - Part 5 In the previous article, we looked at how GPUs are placed within an ESXi host and how GPU modes and assignment policies determine which physical GPU a workload uses. These …
Part 6 Mixed Size vGPU Mode in Practice
Architecting AI Infrastructure - Part 6 Last time, I looked at how Same Size vGPU mode works with different assignment policies and how right-sizing profiles can make placement more flexible. The main point was that both …
Part 7 Same Size vs Mixed Size Placement at Cluster Scale
How Same-size and Mixed-size vGPU placement behavior evolves at cluster scale and how profile strategy influences deployable capacity over time.
Part 8 MIG Partitioning, Placement Geometry, and Stranded Capacity
A deep dive into MIG partitioning, placement geometry, and stranded capacity in GPU infrastructure for AI workloads.
Part 9 Why Multi GPU Requires Topology AwarenessNewest
Explains why distributed inference turns GPU communication into part of the critical path and why topology-aware scheduling is required when models span multiple GPUs.