frankdenneman Frank Denneman is the Chief Technologist for AI at VMware by Broadcom. He is an author of the vSphere host and clustering deep dive series, as well as a podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman

vSphere ML Accelerator Spectrum Deep Dive – GPU Device Differentiators

3 min read

The two last parts reviewed the capabilities of the platform. vSphere can offer fractional GPUs to Multi-GPU setups, catering to the workload’s needs in every stage of its development life cycle. Let’s look at the features and functionality of each supported GPU device. Currently, the range of supported GPU devices is quite broad. In total, 29 GPU devices are supported, dating back from 2016 to the last release in 2023. A table at the end of the article includes links to each GPUs product brief and their datasheet. Although NVIDIA and VMware form a close partnership, the listed support of devices is not a complete match. This can lead to some interesting questions typically answered with; it should work. But as always, if you want bulletproof support, follow the guides to ensure more leisure time on weekends and nights.

VMware HCL and NVAIE Support

The first overview shows the GPU device spectrum and if NVAIE supports them. The VMware HCL supports every device listed in this overview, but NVIDIA decided not to put some of the older devices through their NVAIE certification program. As this is a series about Machine Learning, the diagram shows the support of the device and a C-series vGPU type. The VMware compatibility guide has an AI/ML column, listed as Compute, for a specific certification program that tests these capabilities. If the driver offers a C-series type, the device can run GPU-assisted applications; therefore, I’m listing some older GPU devices that customers still use. With some other devices, VMware hasn’t tested the compute capabilities, but NVIDIA has, and therefore there might be some discrepancies between the VMware HCL and NVAIE supportability matrix. For the newer models, the supportability matrix is aligned. Review the table and follow the GPU device HCL page link to view the supported NVIDIA driver version for your vSphere release.

The Y axis shows the device Interface type and possible slot consumption. This allows for easy analysis of whether a device is the “right fit” for edge locations. Due to space constraints, single-slot PCIe cards allow for denser or smaller configurations. Although every NVIDIA device supported by NVAIE can provide time-shared fractional GPUs, not all provide spatial MIG functionality. A subdivision is made on the Y-axis to show that distinction. The X-axis represents the GPU memory available per device. It allows for easier selection if you know the workload’s technical requirements.

The Ampere A16 is the only device that is listed twice in these overviews. The A16 device uses a dual-slot PCIe interface to offer four distinct GPUs on a single PCB card. The card contains 64GB GPU memory, but vSphere shall report four devices offering 16G of GPU memory. I thought this was the best solution to avoid confusion or remarks that the A16 was omitted, as some architects like to calculate the overall available GPU memory capacity per PCIe slot.

NVLink Support

If you plan to create a platform that supports distributed training using multi-GPU technology, this overview shows the available and supported NVLinks bandwidth capabilities. Not all GPU devices include NVLink support, and the ones with support can wildly differ. The MIG capability is omitted as MIG technology does not support NVLink.

NVIDIA Encoder Support

The GPU decodes the video file before running it through an ML model. But it depends on the process following the outcome of the model prediction, whether to encode the video again and replay it to a display. With some models, the action required after, for example, an anomaly detection, is to generate a warning event. But if a human needs to look at the video for verification, a hardware encoder must be available on the GPU. The Q-series vGPU type is required to utilize the encoders. What may surprise most readers is that most high-end datacenter does not have encoders. This can affect the GPU selection process if you want to create isolated media streams at the edge using MIG technology. Other GPU devices might be a better choice or investigate the performance impact of CPU encoding.

NVIDIA Decoder Support

Every GPU has at least one decoder, but many have more. With MIG, you can assign and isolate decoders to a specific workload. When a GPU is time-sliced, the active workload utilizes all GPU decoders available. Please note that the A16 list has eight decoders, but each distinct GPU on the A16 exposes two decoders to the workload.

GPUDirect RDMA Support

GPUDirect RDMA is supported on all time-sliced and MIG-backed C-series vGPUs on GPU devices that support single root I/O virtualization (SR-IOV). Please note that Linux is the only supported Guest OS for GPUDirect technology. Unfortunately, MS Windows isn’t supported.

Power Consumption

When deploying at an edge location, power consumption can be a constraint. This table list the specified power consumption of each GPU device.

Supported GPUs Overview

The table contains all the GPUs depicted in the diagrams above. Instead of repeating non-descriptive labels like webpage or PDFs, the table shows the GPU release date while linking to its product brief. The label for the datasheet indicates the amount of GPU memory, allowing for easy GPU selection if you want to compare specific GPU devices. Please note that VMware has not conducted C-series vGPU type tests on the device if the HCL Column indicates No. However, the NVIDIA driver does support the C-series vGPU type.

ArchitectureGPU DeviceHCL/ML SupportNVAIE 3.0 SupportProduct BriefDatasheet
PascalTesla P100NoNoOctober 201616GB
PascalTesla P6NoNoMarch 201716GB
VoltaTesla V100NoYesSeptember 201716GB
TuringT4NoYesOctober 201816GB
AmpereA2YesNoNovember 202116GB
PascalP40NoYesNovember 201624GB
TuringRTX 6000 passiveNoYesDecember 201924GB
AmpereRTX A5000NoYesApril 202124GB
AmpereRTX A5500N/AYesMarch 202224GB
AmpereA30YesYesMarch 202124GB
AmpereA30XYesYesMarch 202124GB
AmpereA 10YesYesMarch 202124GB
Ada LovelaceL4YesYesMarch 202324GB
VoltaTesla V100(S)NoYesMarch 201832GB
AmpereA100 (HGX)N/AYesSeptember 202040GB
TuringRTX 8000 passiveNoYesDecember 201948GB
AmpereA40YesYesMay 202048GB
AmpereRTX A6000NoYesDecember 202248GB
Ada LovelaceRTX 6000 AdaN/AYesDecember 202248GB
Ada LovelaceL40YesYesOctober 202048GB
AmpereA 16YesYesJune 202164GB
AmpereA100YesYesJune 202180GB
AmpereA100XYesYesJune 202180GB
AmpereA100 HGXN/AYesNovember 202080GB
Ada LovelaceH100YesYesSeptember 202280GB

Other articles in the vSphere ML Accelerator Spectrum Deep Dive

frankdenneman Frank Denneman is the Chief Technologist for AI at VMware by Broadcom. He is an author of the vSphere host and clustering deep dive series, as well as a podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman