metrics-collector service account doesn't exist in profile created namespace
See original GitHub issue/kind bug
Using 0.6.2 I try to launch a Katib Experiment in the namespace associated with my profile. The metrics collector jobs are not able to start because of the following error
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 2m22s (x5 over 4m53s) job-controller Error creating: pods "sauron-tune-191001-095415-rrfdmkpr-1569979500-" is forbidden: error looking up service account jlewi/metrics-collector: serviceaccount "metrics-collector" not found
It looks like the problem is that the metrics collector jobs are trying to use the metrics-collector service account which doesn’t automatically exist in my namespace.
There are at least two ways to fix this
- Have profile controller create that service account
- Use a different service account that is created by the profile controller
Work Around
To work around this issue you can manually create the service account and binding.
kubectl -n ${NAMESPACE} create serviceaccount metrics-collector
kubectl -n ${NAMESPACE} create rolebinding metrics-collector --clusterrole=metrics-collector --serviceaccount=${NAMESPACE}:metrics-collector
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Cluster Agent Troubleshooting - Datadog Docs
Make sure the Cluster Agent service was created before the Agents' pods, so that the DNS is available in the environment variables:.
Read more >Configuring a Kubernetes service account to assume an IAM ...
This topic covers how to configure a Kubernetes service account to assume an AWS Identity and Access Management (IAM) role. Any pods that...
Read more >Add a Prometheus Metrics Collector - Virtana Docs
The Prometheus metrics collector is open source software that is required to collect Kubernetes data from AWS. Right-sizing metrics are collected for ...
Read more >Diagnostic settings in Azure Monitor - Microsoft Learn
Send Azure Monitor platform metrics and logs to Azure Monitor Logs, Azure Storage, or Azure Event Hubs by using a diagnostic setting.
Read more >Chapter 1. Troubleshooting Red Hat Advanced Cluster ...
Managed cluster creation fails with certificate IP SAN error ... Namespace remains after deleting a cluster ... Troubleshooting the metrics-collector
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The design doc supports push based approach too(logging directly from training container) - https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1alpha3/common_types.go#L133
DB backend is made pluggable in this release(https://github.com/kubeflow/katib/pull/760) . Inorder to support KFMD(or any other DB), we just need to implement the KatibDBInterface(https://github.com/kubeflow/katib/blob/master/pkg/db/v1alpha3/common/kdb.go#L7) – Basically, Init DB, Get metric, Create metric and Delete metric functions
Currently, mysql is the only supported one. Refer: https://github.com/kubeflow/katib/blob/master/pkg/db/v1alpha3/mysql/mysql.go It should be easy to integrate any new DB backend now.
@gaocegege: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.