question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Katib v1alpha2 pytorchjob-example.yaml is failing

See original GitHub issue

Katib v1alpha2 pytorchjob-example.yaml is failing.

Please find the log as below:

asis@paisky2:~/katib/katib/examples/v1alpha2$ kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha2/pytorchjob-example.yaml
experiment.kubeflow.org/random-experiment created
(reverse-i-search)`exper': kubectl -n kubeflow delete ^Cperiment  pytorchjob-example
asis@paisky2:~/katib/katib/examples/v1alpha2$ kubectl get experiment -n kubeflow
NAME                STATUS    AGE
random-experiment   Running   14s
asis@paisky2:~/katib/katib/examples/v1alpha2$ kubectl get pods -n kubeflow
NAME                                                    READY   STATUS    RESTARTS   AGE
katib-controller-7f985d6cd6-6cht6                       1/1     Running   17         19h
katib-db-b48df7777-f626v                                1/1     Running   0          19h
katib-manager-7946dd5984-2pszt                          1/1     Running   8          19h
katib-manager-rest-647f694b7d-c99vc                     1/1     Running   0          19h
katib-suggestion-bayesianoptimization-94c87dd64-gwmps   1/1     Running   0          19h
katib-suggestion-grid-58d9dfb5fd-ltw9r                  1/1     Running   0          19h
katib-suggestion-hyperband-778bb768c8-5k4hv             1/1     Running   0          19h
katib-suggestion-nasrl-d84fbb8f4-szzmr                  1/1     Running   0          19h
katib-suggestion-random-7f96c4d77b-llgb5                1/1     Running   0          19h
katib-ui-6cf97db464-85ndk                               1/1     Running   0          19h
random-experiment-8xsvj587-1564470900-rwwnt             0/1     Error     0          21s
random-experiment-dv2fzqvf-1564470900-ggkdv             0/1     Error     0          21s
random-experiment-fbjlt5tq-1564470900-7mc9x             0/1     Error     0          21s
asis@paisky2:~/katib/katib/examples/v1alpha2$ kubectl -n kubeflow logs random-experiment-8xsvj587-1564470900-rwwnt
I0730 07:15:05.651777       1 main.go:61] Experiment Name: random-experiment, Trial Name: random-experiment-8xsvj587, Job Kind: PyTorchJob
F0730 07:15:08.701325       1 main.go:75] Failed to collect logs: No Pods are found in Trial random-experiment-8xsvj587
goroutine 1 [running]:
github.com/kubeflow/katib/vendor/k8s.io/klog.stacks(0xc000392500, 0xc00015e1c0, 0x78, 0xe0)
        /go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:830 +0xb1
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).output(0x1e83b20, 0xc000000003, 0xc0003676c0, 0x1e0ea00, 0x7, 0x4b, 0x0)
        /go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:781 +0x25e
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).printf(0x1e83b20, 0x7ffe00000003, 0x1255157, 0x1a, 0xc0002a1f18, 0x1, 0x1)
        /go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:678 +0x14e
github.com/kubeflow/katib/vendor/k8s.io/klog.Fatalf(...)
        /go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:1209
main.main()
        /go/src/github.com/kubeflow/katib/cmd/metricscollector/v1alpha2/main.go:75 +0x4e7
asis@paisky2:~/katib/katib/examples/v1alpha2$
  • Kubeflow version: 0.6.1
  • Kubernetes version: (use kubectl version): v1.15.1
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
johnugeorgecommented, Jul 30, 2019

you could install pytorch crd and operators from https://github.com/kubeflow/manifests/tree/master/pytorch-job using Kustomize

0reactions
k8s-ci-robotcommented, Oct 10, 2019

@gaocegege: Closing this issue.

In response to this:

/close

It is stale. But feel free to ask here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hyperparameter Tuning (Katib) - Kubeflow
yaml. In this demo, hyperparameters are embedded as args. You can embed hyperparameters in another way (for example, environment values) by ...
Read more >
How Katib tunes hyperparameter automatically in a ... - Medium
After install Katib v1alpha3, you can run kubectl apply -f katib/examples/v1alpha3/random-example.yaml to try the first example of Katib.
Read more >
Kubeflow 1.0 기능 #5 (KFServing, TFServing)
✓ An Istio DestinationRule is for doing traffic splitting. - TFServing 배포. The example contains three configurations for Google Cloud Storage ...
Read more >
Train a model - | notebook.community
Train a model. This notebook can be used to train a model. The notebook assumes you have already computted the embeddings and stored...
Read more >
Katib - Overview of Trial Templates - 《Kubeflow v1.2 ... - 书栈网
The template should be a valid YAML. Check the grid example. configMap - Kubernetes ConfigMap specification where the experiment's trial template is located ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found