question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

trails succeeding but experiment hangs as metrics not collected

See original GitHub issue

/kind bug

What steps did you take and what happened: created an experiment with stdout metrics collector, the trail container completes, i see the reqauired metrics printed to stdout, but it looks like the metrics collector sidecar is not injected and experiment hangs on the first n trails

What did you expect to happen: the trials would complete, the metrics would be collected and reported back , the next n trails would start

Anything else you would like to add:

Environment:

  • Katib version (check the Katib controller image version): release-0.12
  • Kubernetes version: (kubectl version):
Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-dispatcher", GitCommit:"2a8027f41d28b788b001389f3091c245cd0a9a60", GitTreeState:"clean", BuildDate:"2022-01-21T20:26:49Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6-gke.1500", GitCommit:"7ce0f9f1939dfc1aee910732e84cba03840df91e", GitTreeState:"clean", BuildDate:"2021-11-17T09:30:26Z", GoVersion:"go1.16.9b7", Compiler:"gc", Platform:"linux/amd64"}
  • OS (uname -a): Container-Optimized OS from Google

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
andreyvelichcommented, Feb 16, 2022

@iantowey Try to specify imagePullPolicy: Always in the Katib controller and redeploy the controller. Maybe the image has been cached on your cluster.

I believe the defaulter webhook is working, because I can see metrics collector spec in your Experiment:

Metrics Collector Spec:
    Collector:
      Kind:  StdOut
0reactions
johnugeorgecommented, Mar 23, 2022

Since the reported issue is unrelated to Katib, closing this issue

/close

Read more comments on GitHub >

github_iconTop Results From Across the Web

Vanity metrics in Experimentation Programs Pt.1
Experimentation teams are stuck on a hamster wheel as a result of decisions made in the setup and rollout of testing across the...
Read more >
Source code for ray.tune.analysis.experiment_analysis
"training_iteration" is used by default if no value was passed to ``self.default_metric``. Returns: List of [path, metric] for all persistent checkpoints of the ......
Read more >
The Importance of Implementing Effective Metrics - iSixSigma
One way to keep metrics understandable is to use the SMART (specific, measurable, achievable, relevant, time-based) model. The Achievable step in this model...
Read more >
Machine Learning Experiment Management: How to Organize ...
Machine learning or deep learning experiment tracking is a key factor in delivering successful outcomes. There's no way you will succeed without it....
Read more >
The No Jargon Guide to Understanding A/B Testing Metrics
Even if you're convinced you want to run experiments, it may feel like those in the know are gatekeeping information. But we're determined ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found