question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

metrics-collector service account doesn't exist in profile created namespace

See original GitHub issue

/kind bug

Using 0.6.2 I try to launch a Katib Experiment in the namespace associated with my profile. The metrics collector jobs are not able to start because of the following error

Events:
  Type     Reason        Age                    From            Message
  ----     ------        ----                   ----            -------
  Warning  FailedCreate  2m22s (x5 over 4m53s)  job-controller  Error creating: pods "sauron-tune-191001-095415-rrfdmkpr-1569979500-" is forbidden: error looking up service account jlewi/metrics-collector: serviceaccount "metrics-collector" not found

It looks like the problem is that the metrics collector jobs are trying to use the metrics-collector service account which doesn’t automatically exist in my namespace.

There are at least two ways to fix this

  1. Have profile controller create that service account
  2. Use a different service account that is created by the profile controller

Work Around

To work around this issue you can manually create the service account and binding.

kubectl -n ${NAMESPACE} create serviceaccount metrics-collector

kubectl -n ${NAMESPACE} create rolebinding metrics-collector --clusterrole=metrics-collector --serviceaccount=${NAMESPACE}:metrics-collector

/cc @johnugeorge @richardsliu

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
johnugeorgecommented, Oct 2, 2019

The design doc supports push based approach too(logging directly from training container) - https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1alpha3/common_types.go#L133

DB backend is made pluggable in this release(https://github.com/kubeflow/katib/pull/760) . Inorder to support KFMD(or any other DB), we just need to implement the KatibDBInterface(https://github.com/kubeflow/katib/blob/master/pkg/db/v1alpha3/common/kdb.go#L7) – Basically, Init DB, Get metric, Create metric and Delete metric functions

Currently, mysql is the only supported one. Refer: https://github.com/kubeflow/katib/blob/master/pkg/db/v1alpha3/mysql/mysql.go It should be easy to integrate any new DB backend now.

0reactions
k8s-ci-robotcommented, Oct 10, 2019

@gaocegege: Closing this issue.

In response to this:

I think so.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cluster Agent Troubleshooting - Datadog Docs
Make sure the Cluster Agent service was created before the Agents' pods, so that the DNS is available in the environment variables:.
Read more >
Configuring a Kubernetes service account to assume an IAM ...
This topic covers how to configure a Kubernetes service account to assume an AWS Identity and Access Management (IAM) role. Any pods that...
Read more >
Add a Prometheus Metrics Collector - Virtana Docs
The Prometheus metrics collector is open source software that is required to collect Kubernetes data from AWS. Right-sizing metrics are collected for ...
Read more >
Diagnostic settings in Azure Monitor - Microsoft Learn
Send Azure Monitor platform metrics and logs to Azure Monitor Logs, Azure Storage, or Azure Event Hubs by using a diagnostic setting.
Read more >
Chapter 1. Troubleshooting Red Hat Advanced Cluster ...
Managed cluster creation fails with certificate IP SAN error ... Namespace remains after deleting a cluster ... Troubleshooting the metrics-collector
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found