Install Nvidia GPU on Proxmox K3s LXC - blame None

Blame

c9a46a	admin	2024-10-16 14:48:50	1	# Install Nvidia GPU on Proxmox K3s LXC
0ca852	admin	2024-10-16 14:51:15	2	Guide for installing Nvidia drivers on a Proxmox privileged LXC to enable their use in K3s pods.
421654	admin	2024-10-16 02:14:17	3
186689	admin	2024-10-16 15:18:33	4	## Software
			5	- Proxmox v8.2.2
			6	- Debian LXC v12.7
			7	- K3s v1.30.5
			8
c9a46a	admin	2024-10-16 14:48:50	9	## Installing Nvidia Drivers on the Proxmox Host
186689	admin	2024-10-16 15:18:33	10	Make note of the driver version you install as you'll need to install the same version later on the K3s LXC. Use the following instructions to install the Nvidia driver on your Proxmox host ([Proxmox official guide](https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation)):
421654	admin	2024-10-16 02:14:17	11	```
			12	echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
			13
			14	apt update
5f54c9	admin	2024-10-17 01:13:57	15	apt install -y dkms libc6-dev proxmox-default-headers --no-install-recommends
421654	admin	2024-10-16 02:14:17	16
			17	wget -O NVIDIA-Linux-x86_64-550.120.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run
			18	chmod +x NVIDIA-Linux-x86_64-550.120.run
			19	./NVIDIA-Linux-x86_64-550.120.run --no-nouveau-check --dkms
			20	```
			21
c9a46a	admin	2024-10-16 14:48:50	22	You must also add the following udev rules to create the Nvidia devices on the Proxmox host:
421654	admin	2024-10-16 02:14:17	23	```
			24	cat << EOF > /etc/udev/rules.d/70-nvidia.rules
			25	KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
			26	KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
			27	EOF
			28	```
			29
186689	admin	2024-10-16 15:18:33	30	Reboot your Proxmox host to load the Nvidia driver and create the udev devices. You can verify the drivers are working with `nvidia-smi`:
			31	```
			32	root@pve-media:~# nvidia-smi
			33	Wed Oct 16 11:17:11 2024
			34	+-----------------------------------------------------------------------------------------+
			35	\| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 \|
			36	\|-----------------------------------------+------------------------+----------------------+
			37	\| GPU Name Persistence-M \| Bus-Id Disp.A \| Volatile Uncorr. ECC \|
			38	\| Fan Temp Perf Pwr:Usage/Cap \| Memory-Usage \| GPU-Util Compute M. \|
			39	\| \| \| MIG M. \|
			40	\|=========================================+========================+======================\|
			41	\| 0 NVIDIA GeForce GTX 1080 Off \| 00000000:01:00.0 Off \| N/A \|
			42	\| 0% 35C P8 16W / 210W \| 0MiB / 8192MiB \| 0% Default \|
			43	\| \| \| N/A \|
			44	+-----------------------------------------+------------------------+----------------------+
			45
			46	+-----------------------------------------------------------------------------------------+
			47	\| Processes: \|
			48	\| GPU GI CI PID Type Process name GPU Memory \|
			49	\| ID ID Usage \|
			50	\|=========================================================================================\|
			51	\| No running processes found \|
			52	+-----------------------------------------------------------------------------------------+
			53	```
421654	admin	2024-10-16 02:14:17	54
c9a46a	admin	2024-10-16 14:48:50	55	## Configuring the LXC
			56	Once the driver is loaded and the Nvidia devices exist you need to obtain their major device numbers. Run the following on the Proxmox host to get your numbers. In my case they are 195 and 236:
421654	admin	2024-10-16 02:14:17	57	```
			58	root@pve-media:~# ls -la /dev/nvid*
			59	crw-rw-rw- 1 root root 195, 0 Sep 27 19:40 /dev/nvidia0
			60	crw-rw-rw- 1 root root 195, 255 Sep 27 19:40 /dev/nvidiactl
			61	crw-rw-rw- 1 root root 236, 0 Sep 27 19:40 /dev/nvidia-uvm
			62	crw-rw-rw- 1 root root 236, 1 Sep 27 19:40 /dev/nvidia-uvm-tools
			63
			64	/dev/nvidia-caps:
			65	total 0
			66	drwxr-xr-x 2 root root 80 Sep 27 19:40 .
			67	drwxr-xr-x 20 root root 5060 Sep 27 20:08 ..
			68	cr-------- 1 root root 239, 1 Sep 27 19:40 nvidia-cap1
			69	cr--r--r-- 1 root root 239, 2 Sep 27 19:40 nvidia-cap2
			70	```
			71
0ca852	admin	2024-10-16 14:51:15	72	Next you must add to following to the LXC config changing your device numbers as needed.
c9a46a	admin	2024-10-16 14:48:50	73
0ca852	admin	2024-10-16 14:51:15	74	Edit: `/etc/pve/lxc/<lxc_id>.conf`
421654	admin	2024-10-16 02:14:17	75	```
			76	mp1: /usr/lib/modules,mp=/usr/lib/modules
			77	lxc.cgroup2.devices.allow: c 195:* rw
			78	lxc.cgroup2.devices.allow: c 236:* rw
			79	lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
			80	lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
			81	lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
			82	lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
			83	```
			84
c9a46a	admin	2024-10-16 14:48:50	85	These lines perform the following for your LXC (in order):
ba5eae	admin	2024-10-16 15:53:40	86	- Mount host kernel headers so gpu-operator Helm chart on K3s can build the Nvidia drivers
421654	admin	2024-10-16 02:14:17	87	- Create cgroup2 allowist entries for the major device numbers of the Nvidia devices
c9a46a	admin	2024-10-16 14:48:50	88	- Passthrough Nvidia devices to LXC
421654	admin	2024-10-16 02:14:17	89
186689	admin	2024-10-16 15:18:33	90	### Installing Nvidia Driver on LXC
			91	For the LXC you are going to install the same Nvidia driver version but instead with the `--no-kernel-module` option as the LXC shares the same kernel as your Proxmox host:
			92	```
			93	wget -O NVIDIA-Linux-x86_64-550.120.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run
			94	chmod +x NVIDIA-Linux-x86_64-550.120.run
			95	./NVIDIA-Linux-x86_64-550.120.run --no-kernel-module
			96	```
			97
			98	### Installing the Nvidia Container Toolkit
			99	Next you need to install the Nvidia container toolkit. Start by running the following to add the repository to Apt ([Nvidia official guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)):
			100	```
			101	apt install -y gpg curl
			102
			103	curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \| gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg --yes \
			104	&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \| \
			105	sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \| \
			106	tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
			107	```
			108
			109	Then install the container toolkit:
			110	```
			111	apt update
			112	apt install -y nvidia-container-runtime
			113	```
			114
			115	Debian is not officially supported so we have to create a soft link for `ldconfig` so the [nvidia-container-cli](https://github.com/NVIDIA/nvidia-container-toolkit/issues/147) can find it:
			116	```
			117	ln -s /sbin/ldconfig /sbin/ldconfig.real
			118	```
			119
			120	Reboot the LXC and verify the drivers are loaded and working with `nvidia-smi`:
			121	```
			122	root@k3s-media:~# nvidia-smi
			123	Wed Oct 16 11:15:20 2024
			124	+-----------------------------------------------------------------------------------------+
			125	\| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 \|
			126	\|-----------------------------------------+------------------------+----------------------+
			127	\| GPU Name Persistence-M \| Bus-Id Disp.A \| Volatile Uncorr. ECC \|
			128	\| Fan Temp Perf Pwr:Usage/Cap \| Memory-Usage \| GPU-Util Compute M. \|
			129	\| \| \| MIG M. \|
			130	\|=========================================+========================+======================\|
			131	\| 0 NVIDIA GeForce GTX 1080 Off \| 00000000:01:00.0 Off \| N/A \|
			132	\| 0% 35C P8 16W / 210W \| 0MiB / 8192MiB \| 0% Default \|
			133	\| \| \| N/A \|
			134	+-----------------------------------------+------------------------+----------------------+
			135
			136	+-----------------------------------------------------------------------------------------+
			137	\| Processes: \|
			138	\| GPU GI CI PID Type Process name GPU Memory \|
			139	\| ID ID Usage \|
			140	\|=========================================================================================\|
			141	\| No running processes found \|
			142	+-----------------------------------------------------------------------------------------+
			143	```
421654	admin	2024-10-16 02:14:17	144
ba5eae	admin	2024-10-16 15:53:40	145	## Configuring K3s
			146	If you have done everything correctly [K3s should automatically detect the Nvidia container runtime](https://docs.k3s.io/advanced#nvidia-container-runtime-support) when the service is started. You can verify this by running:
			147	```
			148	root@k3s-media:~# grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml
			149	[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"]
			150	[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options]
			151	BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime"
			152	```
			153
			154	You will then need to add a `RuntimeClass` definition to your cluster:
			155	```
			156	cat << EOF > nvidia-runtime-class.yaml
			157	---
			158	apiVersion: node.k8s.io/v1
			159	kind: RuntimeClass
			160	metadata:
			161	name: nvidia
			162	handler: nvidia
			163	EOF
			164
			165	kubectl apply -f nvidia-runtime-class.yaml
			166	```
			167
			168	### Installing the gpu-operator
			169	First add the `nvidia` Helm repo:
			170	```
			171	helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
			172	helm repo update
			173	```
			174
5f54c9	admin	2024-10-17 01:13:57	175	Next install the `gpu-operator` Helm chart using the following values file tailored for K3s:
ba5eae	admin	2024-10-16 15:53:40	176	```
			177	cat << EOF > gpu-operator-values.yaml
			178	toolkit:
			179	env:
			180	- name: CONTAINERD_CONFIG
			181	value: /var/lib/rancher/k3s/agent/etc/containerd/config.toml
			182	- name: CONTAINERD_SOCKET
			183	value: /run/k3s/containerd/containerd.sock
			184	EOF
			185
			186	helm install gpu-operator nvidia/gpu-operator --create-namespace --values gpu-operator-values.yaml
			187	```
			188
5f54c9	admin	2024-10-17 01:13:57	189	This will create a bunch of pods in the `gpu-operator` namespace that will build the Nvidia drivers. You will see some of these pods restarting, this is normal as some pods are dependant on others completion. Overall the build process should take a couple of minutes. If it's taking longer than 10 minutes you likely have an issue and should look at the logs of the `gpu-operator-node-feature-discovery-worker` pods (This is how I figure out you need to mount the Proxmox host kernel headers on the LXC as the pod couldn't find the kernel modules).
ba5eae	admin	2024-10-16 15:53:40	190
			191	You can verify that everything is working correctly by spinning up a pod that uses the GPU:
			192	```
			193	cat << EOF > gpu-benchmark-pod.yaml
			194	---
			195	apiVersion: v1
			196	kind: Pod
			197	metadata:
			198	name: nbody-gpu-benchmark
			199	namespace: default
			200	spec:
			201	restartPolicy: OnFailure
			202	runtimeClassName: nvidia
			203	containers:
			204	- name: cuda-container
			205	image: nvcr.io/nvidia/k8s/cuda-sample:nbody
			206	args: ["nbody", "-gpu", "-benchmark"]
			207	resources:
			208	limits:
			209	nvidia.com/gpu: 1
			210	env:
			211	- name: NVIDIA_VISIBLE_DEVICES
			212	value: all
			213	- name: NVIDIA_DRIVER_CAPABILITIES
			214	value: all
			215	EOF
			216
			217	kubectl apply -f gpu-benchmark-pod.yaml
			218	```
			219
			220	GPU successfully identified and benchmarked:
			221	```
			222	root@k3s:~# k logs nbody-gpu-benchmark
			223	Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
			224	-fullscreen (run n-body simulation in fullscreen mode)
			225	-fp64 (use double precision floating point values for simulation)
			226	-hostmem (stores simulation data in host memory)
			227	-benchmark (run benchmark to measure performance)
			228	-numbodies=<N> (number of bodies (>= 1) to run in simulation)
			229	-device=<d> (where d=0,1,2.... for the CUDA device to use)
			230	-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
			231	-compare (compares simulation results running once on the default GPU and once on the CPU)
			232	-cpu (run n-body simulation on the CPU)
			233	-tipsy=<file.bin> (load a tipsy model file for simulation)
			234
			235	NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
			236
			237	> Windowed mode
			238	> Simulation data stored in video memory
			239	> Single precision floating point simulation
			240	> 1 Devices used for simulation
			241	GPU Device 0: "Pascal" with compute capability 6.1
			242
			243	> Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1080]
			244	20480 bodies, total time for 10 iterations: 14.370 ms
			245	= 291.883 billion interactions per second
			246	= 5837.668 single-precision GFLOP/s at 20 flops per interaction
			247	```
			248
			249	## Final Notes
			250	When running pods that utilize your GPU you will only be able to see the processes by running `nvidia-smi` on your Proxmox host. Running `nvidia-smi` in the LXC will not show any GPU processes even if they are running.
			251
421654	admin	2024-10-16 02:14:17	252	# References
186689	admin	2024-10-16 15:18:33	253	- https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Preparation
			254	- https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
			255	- https://github.com/UntouchedWagons/K3S-NVidia
			256	- https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
			257	- https://forum.proxmox.com/threads/sharing-gpu-to-lxc-container-failed-to-initialize-nvml-unknown-error.98905/
ba5eae	admin	2024-10-16 15:53:40	258	- https://docs.k3s.io/advanced#nvidia-container-runtime-support