question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature] generic viewer operator for managing user webapps in Kubeflow

See original GitHub issue

Feature Area

/area backend

What feature would you like to see?

Besides tensorboard, KFP viewer controller supports generic viewers.

A viewer is a long running container that exposes a webapp through a certain port. (along with required setup to expose it through ingress, e.g. virtualservice in istio) It can help visualize outputs of a pipeline component, but it can also be used outside of KFP like https://github.com/kubeflow/pipelines/issues/5651.

There are a few different use-cases we are currently getting:

All of them fit into this category, that makes it seem like a generic viewer operator that only abstracts the part of setting up ingress and lifecycle control seems like a good fit. The specific configuration for each different type of service we want to expose can be configured by users of viewer CRD.

Strawman Proposal

A generic viewer CRD like the following:

apiVersion: pipelines.kubeflow.org/v1beta2
kind: Viewer
spec:
  ingress:
    type: istio.virtualservice # maybe we can have more type supports
  containers:
  - name: main
    image: tensorflow:2.3
    command: ['python3', '-m']
    arguments: ['tensorboard', '--port', '8080', '--bind-all']
    envs:
    - name: AWS_SECRET
      valueFrom:
      - xxxx
    port: 8080

This custom resource will be used to setup the webapp for external access with:

  • deployment
  • service
  • virtualservice
  • authorizationpolicy

The major value coming from the generic viewer operator is to unify the resources needed to make this webapp available to users securely. Also, when creating/deleting this custom resource, operator will make sure the group of resources are created/deleted/updated.

I think the major controversial things to discuss is whether the viewer should encode domain knowledge about each type of service to start up. With the number of different use-cases we have seen, sounds to me that we’d better leave those domain knowledge to a different layer of abstraction. Curious about how others think about that.

What is the use case or pain point?

This also helps mitigate the problem that Kubeflow community has two operators to support these features: https://github.com/kubeflow/kubeflow/issues/5921.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:8
  • Comments:14 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
kimwnasptdcommented, May 25, 2021

Lastly, I’d also like to point out that an interesting feature would be to allow users to configure the replicas of the underlying Deployment.

This will essentially allow users to start/stop the underlying Pods, while still maintaining the CR.

So by taking all the above into consideration I’d propose the following iteration:

Strawman proposal v2

apiVersion: pipelines.kubeflow.org/v1beta2
kind: Viewer
spec:
  ingress:
    type: istio.virtualservice # maybe we can have more types
    pathRewrite: /
    httpHeaders:
    - name: X-Forwarded-Prefix
      value: /tensorboard/kubeflow/tb-instance
  replicas: 1
  template:
    spec:
      containers:
      - name: main
        image: tensorflow:2.3
        command: ['python3', '-m']
        arguments: ['tensorboard', '--port', '8080', '--bind-all']
        envs:
        - name: AWS_SECRET
          valueFrom:
          - xxxx
        port: 8080

Would really like to hear your feedback. Also I believe another useful thing to discuss is how to handle the ports the container exposes and the underlying Service. Should we take for granted that the Service will only be sending traffic to Pod’s 8080 port?

1reaction
DavidSpekcommented, Jun 16, 2021

One thing that will need some careful consideration with the generic viewer is how to deal with RBAC permissions. For example, if you would want to allow a user to create tensorboards, but not a file browser instance. To support this I think it will be necessary to define multiple Kinds for the different viewers, but have them share (most of) the reconciliation loop. This then also allows for some domain specific implementations as well. Adding a layer of abstraction above this controller would probably require another controller, partially defeating the purpose of a single unified controller. The different specs would look similar to the following:

apiVersion: viewer.kubeflow.org/v1beta2
kind: Tensorboard
....
apiVersion: viewer.kubeflow.org/v1beta2
kind: Filebrowser
....
Read more comments on GitHub >

github_iconTop Results From Across the Web

Models UI - Kubeflow
The Models web app is responsible for allowing the user to manipulate the Model Servers in their Kubeflow cluster.
Read more >
Kubeflow Operator introduction
The operator watches on all KfDef configuration instances in the cluster as custom resources (CR) and manage them. It handles reconcile requests ...
Read more >
Getting Started with Multi-user Isolation - Kubeflow
Users have view and modify access to their primary profiles. You can share access to your profile with another user in the system....
Read more >
Building Python function-based components - Kubeflow
When you use the Kubeflow Pipelines SDK to convert your Python function to a pipeline component, the Kubeflow Pipelines SDK uses the function's...
Read more >
Introduction | Kubeflow
What is Kubeflow Pipelines? · A user interface (UI) for managing and tracking experiments, jobs, and runs. · An engine for scheduling multi-step ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found