Commit 421654

2024-10-16 02:14:17 admin: pve host install guide
/dev/null .. kubernetes/use nvidia gpu on proxmox k3s lxc.md
@@ 0,0 1,74 @@
+ # Use Nvidia GPU on Proxmox K3s LXC
+
+ # Proxmox Host
+ This guide pertains to installing Nvidia drivers on K3s running on a privileged LXC. Make sure that you install the same Nvidia driver version on the LXC that you installed on the Proxmox (PVE) host.
+
+
+ Install Nvidia drivers on the PVE host. Make note of the driver version you install you'll need to install the same version on the LXC.
+
+
+ Instructions taken from this [guide](https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation):
+ ```
+ echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
+
+ apt update
+ apt install dkms libc6-dev proxmox-default-headers --no-install-recommends
+
+ wget -O NVIDIA-Linux-x86_64-550.120.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run
+ chmod +x NVIDIA-Linux-x86_64-550.120.run
+ ./NVIDIA-Linux-x86_64-550.120.run --no-nouveau-check --dkms
+ ```
+
+
+ You must add the following udev rules to create the Nvidia devices on the PVE host:
+ ```
+ cat << EOF > /etc/udev/rules.d/70-nvidia.rules
+ KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
+ KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
+ EOF
+ ```
+
+
+ Once the Nvidia devices exist you need to obtain their major device numbers (e.g. 195 and 236):
+ ```
+ root@pve-media:~# ls -la /dev/nvid*
+ crw-rw-rw- 1 root root 195, 0 Sep 27 19:40 /dev/nvidia0
+ crw-rw-rw- 1 root root 195, 255 Sep 27 19:40 /dev/nvidiactl
+ crw-rw-rw- 1 root root 236, 0 Sep 27 19:40 /dev/nvidia-uvm
+ crw-rw-rw- 1 root root 236, 1 Sep 27 19:40 /dev/nvidia-uvm-tools
+
+ /dev/nvidia-caps:
+ total 0
+ drwxr-xr-x 2 root root 80 Sep 27 19:40 .
+ drwxr-xr-x 20 root root 5060 Sep 27 20:08 ..
+ cr-------- 1 root root 239, 1 Sep 27 19:40 nvidia-cap1
+ cr--r--r-- 1 root root 239, 2 Sep 27 19:40 nvidia-cap2
+ ```
+
+
+ Next you must add to following to the LXC config:
+
+ `/etc/pve/lxc/<lxc_id>.conf`
+
+ ```
+ mp1: /usr/lib/modules,mp=/usr/lib/modules
+ lxc.cgroup2.devices.allow: c 195:* rw
+ lxc.cgroup2.devices.allow: c 236:* rw
+ lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
+ lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
+ lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
+ lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
+ ```
+
+
+ These lines perform the following:
+ - Mount host kernel headers so gpu-operator helm chart on K3s can build Nvidia drivers
+ - Create cgroup2 allowist entries for the major device numbers of the Nvidia devices
+ - Passthrough Nvidia devices through mounts
+
+
+ # References
+ https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation
+ https://github.com/UntouchedWagons/K3S-NVidia
+ https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
+ https://forum.proxmox.com/threads/sharing-gpu-to-lxc-container-failed-to-initialize-nvml-unknown-error.98905/
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9