Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

trails succeeding but experiment hangs as metrics not collected

See original GitHub issue

/kind bug

What steps did you take and what happened: created an experiment with stdout metrics collector, the trail container completes, i see the reqauired metrics printed to stdout, but it looks like the metrics collector sidecar is not injected and experiment hangs on the first n trails

What did you expect to happen: the trials would complete, the metrics would be collected and reported back , the next n trails would start

Anything else you would like to add:

Environment:

Katib version (check the Katib controller image version): release-0.12
Kubernetes version: (kubectl version):

Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-dispatcher", GitCommit:"2a8027f41d28b788b001389f3091c245cd0a9a60", GitTreeState:"clean", BuildDate:"2022-01-21T20:26:49Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6-gke.1500", GitCommit:"7ce0f9f1939dfc1aee910732e84cba03840df91e", GitTreeState:"clean", BuildDate:"2021-11-17T09:30:26Z", GoVersion:"go1.16.9b7", Compiler:"gc", Platform:"linux/amd64"}

OS (uname -a): Container-Optimized OS from Google

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

Issue Analytics

State:
Created 2 years ago
Comments:14 (7 by maintainers)

Top GitHub Comments

1reaction

andreyvelichcommented, Feb 16, 2022

@iantowey Try to specify imagePullPolicy: Always in the Katib controller and redeploy the controller. Maybe the image has been cached on your cluster.

I believe the defaulter webhook is working, because I can see metrics collector spec in your Experiment:

Metrics Collector Spec:
    Collector:
      Kind:  StdOut

0reactions

johnugeorgecommented, Mar 23, 2022

Since the reported issue is unrelated to Katib, closing this issue

/close

Top Results From Across the Web

Vanity metrics in Experimentation Programs Pt.1

Experimentation teams are stuck on a hamster wheel as a result of decisions made in the setup and rollout of testing across the...

Source code for ray.tune.analysis.experiment_analysis

"training_iteration" is used by default if no value was passed to ``self.default_metric``. Returns: List of [path, metric] for all persistent checkpoints of the ......

The Importance of Implementing Effective Metrics - iSixSigma

One way to keep metrics understandable is to use the SMART (specific, measurable, achievable, relevant, time-based) model. The Achievable step in this model...

Machine Learning Experiment Management: How to Organize ...

Machine learning or deep learning experiment tracking is a key factor in delivering successful outcomes. There's no way you will succeed without it....

The No Jargon Guide to Understanding A/B Testing Metrics

Even if you're convinced you want to run experiments, it may feel like those in the know are gatekeeping information. But we're determined ......