GPU addon stops working after upgrading to Ubuntu 21.10 which uses cgroup v2
See original GitHub issuePlease run microk8s inspect
and attach the generated tarball to this issue.
$ k get all -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
pod/nvidia-device-plugin-validator-6lg6f 0/1 Completed 0 8d
pod/nvidia-cuda-validator-8cmv6 0/1 Completed 0 8d
pod/nvidia-dcgm-exporter-7798m 0/1 Init:CrashLoopBackOff 7 (106s ago) 9d
pod/nvidia-container-toolkit-daemonset-dlnq9 0/1 Init:CrashLoopBackOff 9 (98s ago) 9d
pod/nvidia-operator-validator-hl8rk 0/1 Init:CrashLoopBackOff 7 (99s ago) 9d
pod/nvidia-device-plugin-daemonset-rjdqf 0/1 Init:CrashLoopBackOff 9 (91s ago) 9d
pod/gpu-feature-discovery-hwq2s 0/1 Init:CrashLoopBackOff 9 (90s ago) 9d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nvidia-dcgm-exporter ClusterIP 10.152.183.145 <none> 9400/TCP 9d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nvidia-mig-manager 0 0 0 0 0 nvidia.com/gpu.deploy.mig-manager=true 9d
daemonset.apps/nvidia-dcgm-exporter 1 1 0 1 0 nvidia.com/gpu.deploy.dcgm-exporter=true 9d
daemonset.apps/nvidia-operator-validator 1 1 0 1 0 nvidia.com/gpu.deploy.operator-validator=true 9d
daemonset.apps/nvidia-device-plugin-daemonset 1 1 0 1 0 nvidia.com/gpu.deploy.device-plugin=true 9d
daemonset.apps/nvidia-container-toolkit-daemonset 1 1 0 1 0 nvidia.com/gpu.deploy.container-toolkit=true 9d
daemonset.apps/gpu-feature-discovery 1 1 0 1 0 nvidia.com/gpu.deploy.gpu-feature-discovery=true 9d
$ k describe pod nvidia-device-plugin-daemonset-rjdqf -n gpu-operator-resources
<snip>
Warning Failed 12m (x4 over 14m) kubelet Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: container error: cgroup subsystem devices not found: unknown
We appreciate your feedback. Thank you for using microk8s inspection-report-20211017_133715.tar.gz .
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
GPU addon stops working after upgrading to Ubuntu 21.10 ...
GPU addon stops working after upgrading to Ubuntu 21.10 which uses cgroup v2 #2662. Open. khteh opened this issue on Oct 16, ...
Read more >after Ubuntu 21.10 upgrade: "cannot attach cgroup program ...
Is it possible that you are using a Linux kernel that doesn't properly support the unified cgroup hierarchy? I had the same problem...
Read more >Impish Indri Release Notes - Ubuntu Discourse
Update Manager should open up and tell you: "New distribution release '21.10' is available." If not you can also use /usr/lib/ubuntu-release- ...
Read more >Nvidia drivers are not working properly after upgrading to ...
The i915 driver for your igpu is missing, it should be in the modules package of your kernel. Likely, ubuntu upgrade forgot to...
Read more >Kernel 5.13 broke my Ubuntu 21.10 on RPI4 8Gb
2 ) data files form 21.10 and use it. So I decide to give ubuntu 21.10 another try, and installed it on a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, the workaround disables cgroup v2 as the default, in favour of v1. In my case, none of the other applications run against cgroups v2 so this was a safe change. This only applies if you’re purposely running a version of microk8s prior to 1.22.
It doesn’t sound like my comment applies to your issue; I just wanted to document it for anyone else who runs across this potentially breaking change.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.