The file metric collector example docker image does not sync with the code
See original GitHub issue/kind bug
What steps did you take and what happened: [A clear and concise description of what the bug is.] The trial image docker.io/liuhougangxa/pytorch-mnist:1.0 in https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metricscollector-example.yaml is outdated with https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metrics-collector/mnist.py.
The mnist.py in the docker image
def test(args, model, device, test_loader, epoch):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
logging.info('\n{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n'.format(float(correct) / len(test_loader.dataset), test_loss))
Here the logging format is {{metricName: accuracy, metricValue: {:.4f}}}
, so that the file collector cannot parse it correctly.
What did you expect to happen:
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Kubeflow version:
- Minikube version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Collect Docker metrics with Prometheus
This topic shows you how to configure Docker, set up Prometheus to run as a Docker container, and monitor your Docker instance using...
Read more >Collecting Metrics | Airbyte Documentation
Collecting Metrics. Airbyte supports two ways to collect metrics - using datadog or open telemetry. Fill in METRIC_CLIENT field in .env file to...
Read more >Troubleshooting the container runtime - Google Cloud
This document provides troubleshooting steps for common issues that you might encounter with the container runtime on your Google Kubernetes Engine (GKE) ...
Read more >Monitoring Kubernetes | Troubleshooting - Outcold Solutions
Pod is not getting scheduled; Failed to pull the image ... OK kubernetes uses other container runtime File Inputs: x input(syslog): FAILED no...
Read more >GitLab Container Registry administration
GitLab does not back up Docker images that are not stored on the file system. Enable backups with your object storage provider if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
sorry blocking you, I updated the image in https://github.com/kubeflow/katib/pull/947
@johnugeorge: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.