question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

random-example cannot work

See original GitHub issue

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

In katib web ui, I submitted https://github.com/kubeflow/katib/blob/7443f02c21/examples/v1alpha3/random-example.yaml as an experiment.

What did you expect to happen: This example works well.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] In each trial, the pod panic.

Error logs of each trial pod:

I1130 23:53:52.422917      18 main.go:78] INFO:root:Epoch[19] Train-accuracy=0.122044
I1130 23:53:52.422934      18 main.go:78] INFO:root:Epoch[19] Time cost=3.282
I1130 23:53:52.550241      18 main.go:78] INFO:root:Epoch[19] Validation-accuracy=0.113854
F1130 23:53:53.003408      18 main.go:94] Failed to wait for worker container: Process 6 hadn't completed: open /var/log/katib/6.pid: no such file or directory
goroutine 1 [running]:
github.com/kubeflow/katib/vendor/k8s.io/klog.stacks(0xc000186100, 0xc000250000, 0xa0, 0x256)
	/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:830 +0xb8
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).output(0x129ca40, 0xc000000003, 0xc000210000, 0x1236476, 0x7, 0x5e, 0x0)
	/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:781 +0x2d0
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).printf(0x129ca40, 0x3, 0xc77e24, 0x27, 0xc00008dee8, 0x1, 0x1)
	/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:678 +0x14b
github.com/kubeflow/katib/vendor/k8s.io/klog.Fatalf(...)
	/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:1209
main.main()
	/go/src/github.com/kubeflow/katib/cmd/metricscollector/v1alpha3/file-metricscollector/main.go:94 +0x279

Images I use:

docker images | grep suggestion
gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-hyperopt      latest               989d1ed70824        5 days ago          1.22GB

All other katib components are using images with tag v0.7.0.

Environment:

  • Kubeflow version:
  • Minikube version:
  • Kubernetes version: (use kubectl version): 1.16.3
  • OS (e.g. from /etc/os-release):

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
janvdvegtcommented, Feb 26, 2020

Yeah, the metrics collector captures the logs. The problem I had was that the limits were too wide so the Pod didn’t get OOMKilled but the node had SystemOOM warnings and SIGKILled the container.

0reactions
andreyvelichcommented, Feb 26, 2020

@janvdvegt So on your training job you can see metrics collector container? Try to increase resources for your training job, maybe it helps.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting random numbers in Java [duplicate] - Stack Overflow
The first solution is to use the java.util.Random class: import java.util.Random; Random rand = new Random(); // Obtain a number between [0 -...
Read more >
Math.random() - JavaScript - MDN Web Docs
The implementation selects the initial seed to the random number generation algorithm; it cannot be chosen or reset by the user.
Read more >
Java Math random() method with Examples - GeeksforGeeks
Return Type: This method returns a pseudorandom double greater than or equal to 0.0 and less than 1.0. Example 1:To show the working...
Read more >
random — Generate pseudo-random numbers — Python 3.11 ...
This is equivalent to choice(range(start, stop, step)) , but doesn't actually ... before making selections, so supplying the cumulative weights saves work.
Read more >
Random Class (System) - Microsoft Learn
To avoid this problem, create a single Random object instead of multiple objects. ... Open); BinaryReader bin = new BinaryReader(fs); int seed =...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found