question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

additionalMetricNames does not seem to work for TensorFlowEvent

See original GitHub issue

/kind bug

What steps did you take and what happened: I have an experiment, which looks like this:

apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
  namespace: kubeflow
  labels:
    controller-tools.k8s.io: "1.0"
  name: hyperparameter-tuning
spec:
  objective:
    type: minimize
    goal: 0.00
    objectiveMetricName: epoch_loss
    additionalMetricNames:
    - epoch_binary_accuracy
  algorithm:
    algorithmName: random
  metricsCollectorSpec:
    source:
      fileSystemPath:
        path: /tmp/tensorboard_logs/validation
        kind: Directory
    collector:
      kind: TensorFlowEvent

Individual trials do not seem to report the objective metric or other the additional metrics. However, if I remove the additionalMetricNames altogether, it seems to work fine. none

What did you expect to happen: All metrics to be populated.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Kubeflow version:
  • Minikube version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
issue-label-bot[bot]commented, May 14, 2020

Issue-Label Bot is automatically applying the labels:

Label Probability
area/katib 0.50

Please mark this comment with 👍 or 👎 to give our bot feedback! Links: app homepage, dashboard and code for this bot.

0reactions
andreyvelichcommented, May 18, 2020

@sadeel Thank you for reporting this!

Yes, I think we have a bug there. We add additional metrics to metrics collector container using “;” (https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1alpha3/pod/inject_webhook.go#L177). In TF Event Metrics Collector we use “,” to split these metrics: https://github.com/kubeflow/katib/blob/master/cmd/metricscollector/v1alpha3/tfevent-metricscollector/main.py#L42.

I will create PR to fix it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TensorFlow - Importing data from a TensorBoard TFEvent file?
For me, --logdir is always required and despite passing in these other parameters, it seems that TensorBoard just runs as usual ignoring those...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found