Uber Issue: KFServing admission hook causing widespread issues because its a global admission hook
See original GitHub issue/kind bug
We are getting lots of reports about problems caused because the KFServing admission hook is unavailable preventing pods from being created. The error message looks like the following
4m58s Warning FailedCreate replicaset/activator-5484756f7b Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
Here’s my understanding
-
Currently AdmissionHooks can not be scoped by label; so a pod admission hook is being applied to all pods
-
The KFServing Admission Hooks is being applied to all pods and then in the hook itself it checks whether the pod belongs to a KFServing resource and if it does applies the hook
-
However, if the KFServing web hook deployment is unavailable pod creation can be blocked
-
For a variety of reasons we are reaching into a deadlock state where
- The WebHook is defined but the deployment for the hook is not defined so calls to the admission hook will fail
- Pod creation now fails because the webhook is not defined
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:43 (17 by maintainers)
Top Results From Across the Web
Uber Issue: KFServing admission hook causing ... - GitHub
kind bug We are getting lots of reports about problems caused because the KFServing admission hook is unavailable preventing pods from being ...
Read more >The dark side of Kubernetes admission webhooks
Admission webhooks are widely used in the Kubernetes world, but people often don't know how easily a faulty webhook can cause unwanted outages ......
Read more >KubeCon + CloudNativeCon Europe 2021 Virtual: Full Schedule
Join us at Build with GKE + Anthos, hosted alongside KubeCon + CloudNativeCon Europe 2021, to learn what is new in the world...
Read more >Proceedings of the 2020 USENIX Conference on Operational ...
Jairam Ranganathan, Uber ... Managing the ML production lifecycle is a necessity for wide-scale ... In the case of 8 inference, this causes....
Read more >Managing Cloud-Native Data on Kubernetes - Portworx
sounds incomplete because it is. Breaking up your application components into dif4 ferent control planes creates more complexity and is ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@maganaluis We need to use object selector on the mutating webhook configuration so that only kfserving labelled pods go through the KFServing pod mutator, the problem is that object selector is only supported kubernetes 1.15+ while kubeflow’s minimal requirement is still kubernetes 1.14. If you are on kubernetes 1.15+ you can use following command to solve the issue.
Possible fixes
Add the label control-plane to the kubeflow namespace
1. Change the namespaceSelector to be opt in; match namespaces with specific labelsRef: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-namespaceselector
Possible Work Arounds
control-plane
to the kubeflow namespaceA possible recipe
Get the inference spec
Change the matchSelector
Apply it
Label any namespaces in which you want to use KFServing as