question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KubernetesExecutor does not work with exec authentication on kubernetes client >= 22.6.0

See original GitHub issue

Apache Airflow version

2.2.4 (latest released)

What happened

When using KubernetesExecutor with in_cluster = False in conjunction with a Config with an exec user using kubelogin, something wrong happens and authentication does not take place:

hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 747, in _adopt_completed_pods
hack-scheduler-1  |     pod_list = kube_client.list_namespaced_pod(namespace=self.kube_config.kube_namespace, **kwargs)
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15697, in list_namespaced_pod
hack-scheduler-1  |     return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15812, in list_namespaced_pod_with_http_info
hack-scheduler-1  |     return self.api_client.call_api(
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
hack-scheduler-1  |     return self.__call_api(resource_path, method,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
hack-scheduler-1  |     response_data = self.request(
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 373, in request
hack-scheduler-1  |     return self.rest_client.GET(url,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/rest.py", line 240, in GET
hack-scheduler-1  |     return self.request("GET", url,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/rest.py", line 234, in request
hack-scheduler-1  |     raise ApiException(http_resp=r)
hack-scheduler-1  | kubernetes.client.exceptions.ApiException: (401)
hack-scheduler-1  | Reason: Unauthorized
hack-scheduler-1  | HTTP response headers: HTTPHeaderDict({'Audit-Id': '9d09a92f-d294-4a82-9aac-bbafe9573469', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 27 Mar 2022 21:16:39 GMT', 'Content-Length': '129'})

I managed to trace back to the source of the error to RefreshConfiguration, and create a workaround.

https://github.com/apache/airflow/blob/ee9049c0566b2539a247687de05f9cffa008f871/airflow/kubernetes/kube_client.py#L45-L46

Bypassing the RefreshConfiguration by changing the above two lines to:

config.load_kube_config(context=cluster_context, config_file=configfile)
cfg = None

resolves the problem. I am still debugging what exactly the problem is with RefreshConfiguration and kubelogin.

Factoids:

What you think should happen instead

Authentication should work without a problem.

How to reproduce

It’s hard to reproduce given the specificity of the problem.

  1. Create a service principal and assign permissions to be able to create resources on the AKS cluster.
  2. Install kubelogin
  3. Create a Config file that uses the kubelogin exec authentication flow with service principal authentication with the correct values filled in. See docs
  4. Confirm it works by running
    from kubernetes import client, config
    config.load_kube_config()
    print(client.CoreV1Api().list_namespaced_pod('default'))
    
  5. Try out with Airflow and get lots of 401 errors.

Operating System

Debian

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

Proof of concept deployment with Docker compose for local development purposes using KubernetesExecutor to schedule worker pods in an AKS cluster.

Anything else

This issue happens every time.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jedcunninghamcommented, Mar 28, 2022

I’ll also add, you should be using constraints, which gets you on versions all tested together: https://github.com/apache/airflow/blob/constraints-2.2.4/constraints-3.7.txt https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#constraints-files

That would have gotten you 11.0.0 and 3.0.2 which would work together as well.

1reaction
jedcunninghamcommented, Mar 28, 2022

@dszakallas, not sure what version of the kubernetes provider you have, but try with this combo:

kubernetes==11.0.0
apache-airflow-providers-cncf-kubernetes==3.1.1

Basically, 2.2.4 only works with 11.0.0, and provider 3.1.2 only works with k8s >=21.7.0. The “common ground” is the above versions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Client Authentication (v1beta1) - Kubernetes
Cluster contains information to allow an exec plugin to communicate with the kubernetes cluster being authenticated to. To ensure that this ...
Read more >
Kubernetes Executor — Airflow Documentation
To troubleshoot issues with KubernetesExecutor, you can use airflow kubernetes generate-dag-yaml command. This command generates the pods as they will be ...
Read more >
Spark executors fails to run on kubernetes cluster
I don't have much experience with PySpark but I once setup Java Spark to run on a Kubernetes cluster in client mode, like...
Read more >
o
error: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1" Double check the current version. kubectl version. if you are at Major:1 and ...
Read more >
3 Ways to Run Airflow on Kubernetes - Fullstaq
When we run this DAG our task will be run in our worker pod by the KubernetesExecutor and cleaned up after success or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found