question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SDK] Create API to get Trial metrics from Katib DB

See original GitHub issue

/kind feature /area sdk

Our Katib Python SDK doesn’t have an API to get Trial metrics from Katib DB. Currently, user can see the Trial metrics only using Katib UI. We should give an ability to query metrics using GetObservationLog gRPC API via Katib SDK.

From the security perspective user can run this gRPC API from any namespace and any experiment since our DB Manager doesn’t have any auth checks, right ? Should we investigate how to improve user isolation for Katib (“multi-user mode feature”) ? One solution could be to use Istio to allow traffic only from the appropriate user, as @apo-ger mentioned here: https://github.com/kubeflow/katib/pull/1983#issuecomment-1319674570.

What do you think @johnugeorge @gaocegege @tenzen-y @anencore94 @kimwnasptd @apo-ger ?


Love this feature? Give it a 👍 We prioritize the features with the most 👍

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
kimwnasptdcommented, Nov 18, 2022

@andreyvelich that’s a great feature!

Regarding the authnz part, I think this discussion will revolve around having programmatic client support for the DB Manager API Server. This is the same with how KFP allows Pods from other namespaces to use its API Server to perform CRUD tasks https://github.com/kubeflow/pipelines/issues/5138.

And this is done by:

  1. Allowing everyone to talk to the DB Manager, but without setting the kubeflow-userid header (to avoid impersonations).
  2. The DB Manager will drop any requests that are not authenticated
  3. In-cluster pods that will need to talk to the DB Manager will need to provide an audience scoped ServiceAccount token
  4. The DB Manager will need to validate the token
  5. The DB Manager will then extract the identity (ServiceAccount name) from that token and perform a SubjectAccessReview

Then there’s also the discussion on how to use the ServiceAccount tokens from outside the cluster. But this is a next step once we have the above in-cluster behavior working

1reaction
anencore94commented, Dec 5, 2022

https://docs.google.com/document/d/1TRUKUY1zCCMdgF-nJ7QtzRwifsoQop0V8UnRo-GWlpI/edit?disco=AAAAknO9PlM

For answering the above question, @andreyvelich . I’ve seen many company make their own UI page using several kubeflow APIs including kubeflow notebooks, pipelines and katib. Thus if there is a http server for katib, many clients including there own sdk and ui will use those APIs much easier

Read more comments on GitHub >

github_iconTop Results From Across the Web

katib/api.md at master - GitHub
Repository for hyperparameter tuning. Contribute to kubeflow/katib development by creating an account on GitHub.
Read more >
Getting Started with Katib - Kubeflow
This guide shows how to get started with Katib and run a few examples using the command line and the Katib user interface...
Read more >
How Katib tunes hyperparameter automatically in a ... - Medium
For each set of hyperparameters, Katib will internally generate a Trial CR including fields about the hyperparameters key-value pairs, job ...
Read more >
Metrics not reporting to Katib server - experiment timing out
Here is what I have tried : I created a GKE cluster and installed katib and training-operator and kubeflow pipelines on it. I...
Read more >
Arrikto Enterprise Kubeflow Components and Features
Katib supports running simple jobs as trials, but Kale implements a shim to have the trials run pipelines in Kubeflow Pipelines and then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found