Install Nvidia GPU on Proxmox K3s LXC
Guide for installing Nvidia drivers on a Proxmox privileged LXC to enable their use in K3s pods.
Installing Nvidia Drivers on the Proxmox Host
Make note of the driver version you install as you'll need to install the same version later on the K3s LXC. Use the following instructions to install the Nvidia driver on your Proxmox host (Proxmox official docs):
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf apt update apt install dkms libc6-dev proxmox-default-headers --no-install-recommends wget -O NVIDIA-Linux-x86_64-550.120.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run chmod +x NVIDIA-Linux-x86_64-550.120.run ./NVIDIA-Linux-x86_64-550.120.run --no-nouveau-check --dkms
You must also add the following udev rules to create the Nvidia devices on the Proxmox host:
cat << EOF > /etc/udev/rules.d/70-nvidia.rules KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'" KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'" EOF
Reboot your Proxmox host to load the Nvidia driver and create the udev devices.
Configuring the LXC
Once the driver is loaded and the Nvidia devices exist you need to obtain their major device numbers. Run the following on the Proxmox host to get your numbers. In my case they are 195 and 236:
root@pve-media:~# ls -la /dev/nvid* crw-rw-rw- 1 root root 195, 0 Sep 27 19:40 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Sep 27 19:40 /dev/nvidiactl crw-rw-rw- 1 root root 236, 0 Sep 27 19:40 /dev/nvidia-uvm crw-rw-rw- 1 root root 236, 1 Sep 27 19:40 /dev/nvidia-uvm-tools /dev/nvidia-caps: total 0 drwxr-xr-x 2 root root 80 Sep 27 19:40 . drwxr-xr-x 20 root root 5060 Sep 27 20:08 .. cr-------- 1 root root 239, 1 Sep 27 19:40 nvidia-cap1 cr--r--r-- 1 root root 239, 2 Sep 27 19:40 nvidia-cap2
Next you must add to following to the LXC config changing your device numbers as needed.
Edit: /etc/pve/lxc/<lxc_id>.conf
mp1: /usr/lib/modules,mp=/usr/lib/modules lxc.cgroup2.devices.allow: c 195:* rw lxc.cgroup2.devices.allow: c 236:* rw lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
These lines perform the following for your LXC (in order):
- Mount host kernel headers so gpu-operator helm chart on K3s can build the Nvidia drivers
- Create cgroup2 allowist entries for the major device numbers of the Nvidia devices
- Passthrough Nvidia devices to LXC
References
https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation https://github.com/UntouchedWagons/K3S-NVidia https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/ https://forum.proxmox.com/threads/sharing-gpu-to-lxc-container-failed-to-initialize-nvml-unknown-error.98905/
