GPU Kubeflow cluster timeline and advice
See original GitHub issueI’ve been having some issues getting enabling gpus on the kubeflow cluster I recently set up.
Per this discussion, it seems that microk8s enable gpu
works best for people who already have the nvidia-container-runtime installed on their system for microk8s version 1.22. However, as it’s well known by now, the kubeflow add-on is only supported up to version 1.21 of microk8s. I’ve tried both:
-
Going through the steps to enable gpus with microk8s v1.21. Logs show the operator still installing its own nvidia-container-runtime, despite my clear statement
--set driver.enabled=false
when callinghelm3 install
. -
Going through the steps of using juju and charmed operators to bootstrap a kubeflow cluster in microk8s v1.22 and see the same seldon error as reported in #2496 .
What should I do? Uninstall nvidia-container-runtime
on my host and cross my fingers microk8s enable gpu
will work in that case? If there’s any way I can contribute to getting kubeflow running in microk8s v1.22 I’m willing to chip in and help. Any guidance at all on solving this problem would be greatly appreciated.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Hi @odellus, a suggestion would be to use v1.20 because the GPU support on 1.21 is not in a good state and 1.22 does not have kubeflow.
The trick to enabling on microk8s v1.20 was to install the cuda drivers with the local .run script instead of the .deb files to install with
dpkg
. Closing.