Commit 186689

2024-10-16 15:18:33 admin: nvidia lxc install
kubernetes/use nvidia gpu on proxmox k3s lxc.md ..
@@ 1,8 1,13 @@
# Install Nvidia GPU on Proxmox K3s LXC
Guide for installing Nvidia drivers on a Proxmox privileged LXC to enable their use in K3s pods.
+ ## Software
+ - Proxmox v8.2.2
+ - Debian LXC v12.7
+ - K3s v1.30.5
+
## Installing Nvidia Drivers on the Proxmox Host
- Make note of the driver version you install as you'll need to install the same version later on the K3s LXC. Use the following instructions to install the Nvidia driver on your Proxmox host ([Proxmox official docs](https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation)):
+ Make note of the driver version you install as you'll need to install the same version later on the K3s LXC. Use the following instructions to install the Nvidia driver on your Proxmox host ([Proxmox official guide](https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation)):
```
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
@@ 22,7 27,30 @@
EOF
```
- Reboot your Proxmox host to load the Nvidia driver and create the udev devices.
+ Reboot your Proxmox host to load the Nvidia driver and create the udev devices. You can verify the drivers are working with `nvidia-smi`:
+ ```
+ root@pve-media:~# nvidia-smi
+ Wed Oct 16 11:17:11 2024
+ +-----------------------------------------------------------------------------------------+
+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
+ |-----------------------------------------+------------------------+----------------------+
+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
+ | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
+ | | | MIG M. |
+ |=========================================+========================+======================|
+ | 0 NVIDIA GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
+ | 0% 35C P8 16W / 210W | 0MiB / 8192MiB | 0% Default |
+ | | | N/A |
+ +-----------------------------------------+------------------------+----------------------+
+
+ +-----------------------------------------------------------------------------------------+
+ | Processes: |
+ | GPU GI CI PID Type Process name GPU Memory |
+ | ID ID Usage |
+ |=========================================================================================|
+ | No running processes found |
+ +-----------------------------------------------------------------------------------------+
+ ```
## Configuring the LXC
Once the driver is loaded and the Nvidia devices exist you need to obtain their major device numbers. Run the following on the Proxmox host to get your numbers. In my case they are 195 and 236:
@@ 59,9 87,64 @@
- Create cgroup2 allowist entries for the major device numbers of the Nvidia devices
- Passthrough Nvidia devices to LXC
+ ### Installing Nvidia Driver on LXC
+ For the LXC you are going to install the same Nvidia driver version but instead with the `--no-kernel-module` option as the LXC shares the same kernel as your Proxmox host:
+ ```
+ wget -O NVIDIA-Linux-x86_64-550.120.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run
+ chmod +x NVIDIA-Linux-x86_64-550.120.run
+ ./NVIDIA-Linux-x86_64-550.120.run --no-kernel-module
+ ```
+
+ ### Installing the Nvidia Container Toolkit
+ Next you need to install the Nvidia container toolkit. Start by running the following to add the repository to Apt ([Nvidia official guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)):
+ ```
+ apt install -y gpg curl
+
+ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg --yes \
+ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
+ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
+ tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+ ```
+
+ Then install the container toolkit:
+ ```
+ apt update
+ apt install -y nvidia-container-runtime
+ ```
+
+ Debian is not officially supported so we have to create a soft link for `ldconfig` so the [nvidia-container-cli](https://github.com/NVIDIA/nvidia-container-toolkit/issues/147) can find it:
+ ```
+ ln -s /sbin/ldconfig /sbin/ldconfig.real
+ ```
+
+ Reboot the LXC and verify the drivers are loaded and working with `nvidia-smi`:
+ ```
+ root@k3s-media:~# nvidia-smi
+ Wed Oct 16 11:15:20 2024
+ +-----------------------------------------------------------------------------------------+
+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
+ |-----------------------------------------+------------------------+----------------------+
+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
+ | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
+ | | | MIG M. |
+ |=========================================+========================+======================|
+ | 0 NVIDIA GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
+ | 0% 35C P8 16W / 210W | 0MiB / 8192MiB | 0% Default |
+ | | | N/A |
+ +-----------------------------------------+------------------------+----------------------+
+
+ +-----------------------------------------------------------------------------------------+
+ | Processes: |
+ | GPU GI CI PID Type Process name GPU Memory |
+ | ID ID Usage |
+ |=========================================================================================|
+ | No running processes found |
+ +-----------------------------------------------------------------------------------------+
+ ```
# References
- https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation
- https://github.com/UntouchedWagons/K3S-NVidia
- https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
- https://forum.proxmox.com/threads/sharing-gpu-to-lxc-container-failed-to-initialize-nvml-unknown-error.98905/
+ - https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation
+ - https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
+ - https://github.com/UntouchedWagons/K3S-NVidia
+ - https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
+ - https://forum.proxmox.com/threads/sharing-gpu-to-lxc-container-failed-to-initialize-nvml-unknown-error.98905/
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9