Failed to adopt pod: Scheduler cannot adopt running pods
See original GitHub issueApache Airflow version
2.2.4
What happened
I am running Airflow 2.2.4 on Kubernetes, using the KubernetesExecutor. If I re-create the scheduler pod, it attempts to adopt running job pods but fails to do so:
[2022-05-10 11:35:23,707] {kubernetes_executor.py:714} INFO - Failed to adopt pod <REDACTED>. Reason: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ac152b63-74ef-48c2-b4eb-fe5fbc808a56', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '251e299d-3b5d-4c7a-a3d9-46f17a316c93', 'X-Kubernetes-Pf-Prioritylevel-Uid': '66005fda-3f10-4344-8170-8c819dbbf59f', 'Date': 'Tue, 10 May 2022 11:35:23 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"<REDACTED>\" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations) <TRUNCATED>
The pods are then killed by the scheduler.
I tried to dig into the code as best I could, and I found that this might be caused by the KubernetesExecutor trying to update the pod’s metadata.labels here - but I could be wrong as I’m not very familiar with this part of Airflow.
What you think should happen instead
The scheduler should be able to adopt running pods instead of killing them.
How to reproduce
- Run Airflow with the KubernetesExecutor
- Start a long-running task
- Re-create the scheduler pod
Operating System
Debian GNU/Linux 10 (buster)
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
This happens when the scheduler pod is re-created while a job pod is running.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:7 (6 by maintainers)
 Top Results From Across the Web
Top Results From Across the Web
Airflow with Kubernetes Executor unable to adopt and remove ...
All the pods are running in the airflow-build Kubernetes namespace. ... The scheduler tries to adopt the completed Worker pod.
Read more >How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >Use the KubernetesPodOperator | Cloud Composer
You can pass secrets to the Kubernetes pods by using the KubernetesPodOperator . Secrets must be defined in Kubernetes, or the pod fails...
Read more >Use the cluster autoscaler in Azure Kubernetes Service (AKS)
The cluster autoscaler may be unable to scale down if pods can't move, such as in the following situations: A pod is directly...
Read more >Perform a Rolling Update on a DaemonSet - Kubernetes
Some nodes run out of resources ... The rollout is stuck because new DaemonSet pods can't be scheduled on at least one node....
Read more > Top Related Medium Post
Top Related Medium Post
No results found
 Top Related StackOverflow Question
Top Related StackOverflow Question
No results found
 Troubleshoot Live Code
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free Top Related Reddit Thread
Top Related Reddit Thread
No results found
 Top Related Hackernoon Post
Top Related Hackernoon Post
No results found
 Top Related Tweet
Top Related Tweet
No results found
 Top Related Dev.to Post
Top Related Dev.to Post
No results found
 Top Related Hashnode Post
Top Related Hashnode Post
No results found

@potiuk happy to report that upgrading to 2.3.1 fixed the issue for us. Thank you!
Glad it helped 😃