Could not initialize plugin ‘libnvidia-vgx.so - Check SR-IOV in the BIOS

I was building a new lab with some NVIDIA A30 GPUs in a few hosts, and after installing the NVIDIA driver onto the ESXi host, I got the following error when powering up a VM with a vGPU profile:

Typically that means three things:

Shared Direct passthrough is not enabled on the GPU
ECC memory is enabled
VM Memory reservation was not set to protect its full memory range/

But shared direct passthrough was enabled, and because I was using a C-type profile and an NVIDIA A30 GPU, I did not have to disable ECC memory. According to the NVIDIA Virtual GPU software documentation: 3.4 Disabling and Enabling ECC Memory

Reserve all guest memory (all locked) was enabled, and this setting is recommended. If someone changes the memory setting of the VM at a later stage, the memory reservation is automatically updated, and no errors will emerge.

I discovered that my systems did not have SR-IOV enabled in the BIOS. By enabling “SR-IOV Global Enable” I could finally boot a VM

SR-IOV is also required if you want to use vGPU Multi-Instance GPU, so please check for this setting when setting up your ESXi hosts.

But for completeness’ sake, let’s go over shared direct passthrough and GPU ECC memory configurations and see how to check both settings:

Shared Direct Passthrough

Step 1: Select the ESXi host with the GPU in the inventory view in vCenter

Step 2: Select Configure in the menu shown on the right side of the screen

Step 3: Select Graphics in the Hardware section

Select the GPU and click on Edit – the Edit Graphics Device Settings window opens

If you are going to change a setting, ensure that the ESXi host is in maintenance mode.
Select Shared Direct and click on OK.

Disabling ECC Memory on the GPU Device

To disable ECC memory on the GPU Device, you must use the NVIDIA-SMI command, which you need to operate from the ESXi host shell. Ensure you have SSH enabled on the host. (Select ESXi host, go to configure, System, Services, select SSH and click on Start)

Open an ssh session to the host and enter the following command:

nvidia-smi --query-gpu=ecc.mode.current --format=csv

If you want to disable ECC on your GPU (you do not need to if you use C-type vGPU Profiles for ML workloads), run the following command. Please ensure your ESXi host is in maintenance mode if you change a setting on the ESXi host.

nvidia-smi -e 0

you can now reboot your host, or if you want to verify whether the setting has been changed, enter the following command:

nvidia-smi --query-gpu=ecc.mode.pending --format=csv

Could not initialize plugin ‘libnvidia-vgx.so – Check SR-IOV in the BIOS

The misconception of self-learning capabilities of Large Language Models…

Basic Terminologies Large Language Models

My Sessions at VMware Explore 2023 Las Vegas

One Reply to “Could not initialize plugin ‘libnvidia-vgx.so – Check SR-IOV in…”