question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kubeflow listed as available AArch64 addon, but missing enable script

See original GitHub issue

Per #1086, and my (limited) understanding from Kubeflow discussion elsewhere, I gathered that KF probably wouldn’t be available on AArch64 MicroK8s. I was surprised then when microk8s enable --help showed Kubeflow on the list, and also when I double-checked the supported architectures in addon-lists.yaml.

However, when I tried running microk8s.enable kubeflow, I got Nothing to do for kubeflow, which I believe I’ve traced back to the fact that ${SNAP}/actions/enable.kubeflow.sh doesn’t exist.

Is this expected behavior? I mostly just wanted to report this since I found it odd that KF is ostensibly listed as an available add-on, but the script to enable doesn’t seem to be shipped with the snap I installed. I suspect I may just be misunderstanding something here too.

I believe I’m using a fresh installation of the latest stable version as of today, but here’s the inspect archive for good measure. Thanks!

inspection-report-20200403_210033.tar.gz

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
MrXinWangcommented, Apr 17, 2020

@matthiaslau If I remember correctly, I did tests on that (I guess by saying distributed training you are mentioning deploy kubeflow on a multi-node k8s cluster and run training job there).

What I did was running the object detection example using self-compiled tensorflow 1.10 GPU version container and deployed everything with v0.5.1 kubeflow on my cluster (an arm64 server as master and Jetson TX2 as node). I did not have problem with TFJob itself, the training can also be started, but after (I think) several steps it will warn me about the OOM error, which to me is a little bit weird…

I am pretty sure that I used the GPU as I think the GPU version tensorflow will automatically assign tasks to the GPU, and I could see the GPU0 in the log…

1reaction
matthiaslaucommented, Apr 7, 2020

Thank you for the answer! 😃 I am currently working with a cluster of NVIDIA Jetson Nanos (GPU: Maxwell with 128 Cores, CPU: Quad-Core-ARM A57 with 1,43 GHz). It is also interesting to expand the cluster with Jetson Xavier (GPU: Volta with 512 Cores, CPU: 8 ARM-v8.2-64-Bit-CPU-Cores).

It would be really great if GPU and Kubeflow will be supported in the future as microk8s would be a good fit for this hardware. But I see the difficult dependencies especially for kubeflow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubeflow listed as available AArch64 addon, but missing ...
Hi @matthiaslau , I saw this issue in the kubeflow community under my reply, also as I am working on enabling kubeflow on...
Read more >
Troubleshooting | Kubeflow
This page presents some hints for troubleshooting specific problems that you may encounter. Diagnosing problems in your Kubeflow Pipelines ...
Read more >
Use, edit or create addons - MicroK8s
To enable an addon, you need to call microk8s enable followed by the name of the addon, eg microk8s enable dns . Similarly,...
Read more >
The MicroK8s addons framework – now open to everyone!
How to use MicroK8s addons ... With the microk8s status command you can see the list of the available addons: To enable an...
Read more >
Release notes - microk8s - Discuss Kubernetes
New microk8s images import and microk8s images export-local commands, allowing side-loading of OCI images across the whole cluster. Extend the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found