[feature] generic viewer operator for managing user webapps in Kubeflow
See original GitHub issueFeature Area
/area backend
What feature would you like to see?
Besides tensorboard, KFP viewer controller supports generic viewers.
A viewer is a long running container that exposes a webapp through a certain port. (along with required setup to expose it through ingress, e.g. virtualservice in istio) It can help visualize outputs of a pipeline component, but it can also be used outside of KFP like https://github.com/kubeflow/pipelines/issues/5651.
There are a few different use-cases we are currently getting:
- tensorboard (supported)
- file browser https://github.com/kubeflow/pipelines/issues/5651
- captum insights https://captum.ai/docs/captum_insights
- jupyter notebooks / vscode / rstudio (if we unify with Kubeflow notebooks controller)
All of them fit into this category, that makes it seem like a generic viewer operator that only abstracts the part of setting up ingress and lifecycle control seems like a good fit. The specific configuration for each different type of service we want to expose can be configured by users of viewer CRD.
Strawman Proposal
A generic viewer CRD like the following:
apiVersion: pipelines.kubeflow.org/v1beta2
kind: Viewer
spec:
ingress:
type: istio.virtualservice # maybe we can have more type supports
containers:
- name: main
image: tensorflow:2.3
command: ['python3', '-m']
arguments: ['tensorboard', '--port', '8080', '--bind-all']
envs:
- name: AWS_SECRET
valueFrom:
- xxxx
port: 8080
This custom resource will be used to setup the webapp for external access with:
- deployment
- service
- virtualservice
- authorizationpolicy
The major value coming from the generic viewer operator is to unify the resources needed to make this webapp available to users securely. Also, when creating/deleting this custom resource, operator will make sure the group of resources are created/deleted/updated.
I think the major controversial things to discuss is whether the viewer should encode domain knowledge about each type of service to start up. With the number of different use-cases we have seen, sounds to me that we’d better leave those domain knowledge to a different layer of abstraction. Curious about how others think about that.
What is the use case or pain point?
This also helps mitigate the problem that Kubeflow community has two operators to support these features: https://github.com/kubeflow/kubeflow/issues/5921.
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:8
- Comments:14 (12 by maintainers)
Top GitHub Comments
Lastly, I’d also like to point out that an interesting feature would be to allow users to configure the replicas of the underlying Deployment.
This will essentially allow users to start/stop the underlying Pods, while still maintaining the CR.
So by taking all the above into consideration I’d propose the following iteration:
Strawman proposal v2
Would really like to hear your feedback. Also I believe another useful thing to discuss is how to handle the ports the container exposes and the underlying Service. Should we take for granted that the Service will only be sending traffic to Pod’s
8080
port?One thing that will need some careful consideration with the generic viewer is how to deal with RBAC permissions. For example, if you would want to allow a user to create tensorboards, but not a file browser instance. To support this I think it will be necessary to define multiple
Kind
s for the different viewers, but have them share (most of) the reconciliation loop. This then also allows for some domain specific implementations as well. Adding a layer of abstraction above this controller would probably require another controller, partially defeating the purpose of a single unified controller. The different specs would look similar to the following: