question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[kubeflow 1.3] unable to route requests to the kfserving pod due to auth policy

See original GitHub issue

/kind bug

What steps did you take and what happened: deploy and inferenceserving example

I am installing in a preexisting knative deployment and perhaps I’m missing some steps. When I make a request, either using the external route/ingress or the internal service, the requests hangs for a long time and then times out. retracing the steps of the request, I ended up in the activator pod, and found this error message:

{"level":"error","ts":"2021-04-24T19:10:16.241Z","logger":"activator","caller":"net/revision_backends.go:322","msg":"Failed to probe clusterIP 172.30.171.229:80","knative.dev/controller":"activator","knative.dev/pod":"activator-886cd96fb-4gqq5","knative.dev/key":"raffa/flowers-sample-predictor-default-00002","error":"unexpected body: want \"queue\", got \"RBAC: access denied\"","stacktrace":"knative.dev/serving/pkg/activator/net.(*revisionWatcher).checkDests\n\t/opt/app-root/src/go/src/knative.dev/serving/pkg/activator/net/revision_backends.go:322\nknative.dev/serving/pkg/activator/net.(*revisionWatcher).run\n\t/opt/app-root/src/go/src/knative.dev/serving/pkg/activator/net/revision_backends.go:366"}
{"level":"warn","ts":"2021-04-24T19:10:16.440Z","logger":"activator","caller":"net/revision_backends.go:286","msg":"Failed probing pods","knative.dev/controller":"activator","knative.dev/pod":"activator-886cd96fb-4gqq5","knative.dev/key":"raffa/flowers-sample-predictor-default-00002","curDests":{"ready":"10.128.2.38:8012","notReady":""},"error":"unexpected body: want \"queue\", got \"RBAC: access denied\""}

so it looks like the activator pod is probing the kfserving pod for the length of the queue but it’s getting an RBAC error, due to, presumably, this istio RBAC rule:

spec:
  rules:
    - when:
        - key: 'request.headers[kubeflow-userid]'
          values:
            - raffa
    - when:
        - key: source.namespace
          values:
            - raffa

this is a standard RBAC rule created by the kubeflow profile when using a multitenant deployment. I am not 100% sure that this is what is holding up the requests but it seems likely, because when I forge a request to go direclty to the kfserving pod I get a response.

am I missing something? This problem should be affecting any standard multitenant kubeflow deployment, how is it normally fixed?

What did you expect to happen: being able to route requests to the kfserving pod.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Istio Version: 1.6.5
  • Knative Version: 0.19.0
  • KFServing Version: 1.3
  • Kubeflow version: 1.3
  • Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
  • Minikube version: na
  • Kubernetes version: (use kubectl version): 1.20
  • OS (e.g. from /etc/os-release):

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:6
  • Comments:14 (10 by maintainers)

github_iconTop GitHub Comments

5reactions
markwintercommented, Jul 21, 2021

@jiaozhentian

It should look like this (and one more with component: transformer)

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allowlist-by-paths
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: predictor
  action: ALLOW
  rules:
  - to:
    - operation:
        paths:
        - /metrics
        - /healthz
        - /ready
        - /wait-for-drain
        - /v1/models/*

AuthorizationPolicy applied in istio-system will be applied to all namespaces (e.g. if you have kfserving in many namespaces)

3reactions
Tomclicommented, Oct 27, 2021

We should add this AuthorizationPolicy to the Kubeflow manifests because Kubeflow 1.4 still has this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KServe | Kubeflow
KServe enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) ...
Read more >
KFServing pod "error: container storage-initializer is not valid"
The only containers my predict service pod has are kfserving and queue-proxy. I am currently on Kubeflow 1.2 and Kubernetes 1.17 on IBM...
Read more >
Changelog — Rok 2.0 documentation
Update the introductory user guide regarding how authentication and authorization work in Arrikto EKF. Support client-authentication with JWT access tokens.
Read more >
PDF - Seldon Deploy Documentation
2.1 Seldon Deploy 1.3.0. 9 July 2021. 2.1.1 What's New. • Feature Distributions Monitoring. • Policy Based Authorization (experimental).
Read more >
kfserving 1.3.0 · helm/cowboysysop - Artifact Hub
These commands deploy KFServing on the Kubernetes cluster in the default ... kubectl delete crd inferenceservices.serving.kubeflow.org $ kubectl delete crd ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found