additionalMetricNames does not seem to work for TensorFlowEvent
See original GitHub issue/kind bug
What steps did you take and what happened: I have an experiment, which looks like this:
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespace: kubeflow
labels:
controller-tools.k8s.io: "1.0"
name: hyperparameter-tuning
spec:
objective:
type: minimize
goal: 0.00
objectiveMetricName: epoch_loss
additionalMetricNames:
- epoch_binary_accuracy
algorithm:
algorithmName: random
metricsCollectorSpec:
source:
fileSystemPath:
path: /tmp/tensorboard_logs/validation
kind: Directory
collector:
kind: TensorFlowEvent
Individual trials do not seem to report the objective metric or other the additional metrics. However, if I remove the additionalMetricNames altogether, it seems to work fine.
What did you expect to happen: All metrics to be populated.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Kubeflow version:
- Minikube version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
TensorFlow - Importing data from a TensorBoard TFEvent file?
For me, --logdir is always required and despite passing in these other parameters, it seems that TensorBoard just runs as usual ignoring those...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! Links: app homepage, dashboard and code for this bot.
@sadeel Thank you for reporting this!
Yes, I think we have a bug there. We add additional metrics to metrics collector container using “;” (https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1alpha3/pod/inject_webhook.go#L177). In TF Event Metrics Collector we use “,” to split these metrics: https://github.com/kubeflow/katib/blob/master/cmd/metricscollector/v1alpha3/tfevent-metricscollector/main.py#L42.
I will create PR to fix it.