question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to adopt pod: Scheduler cannot adopt running pods

See original GitHub issue

Apache Airflow version

2.2.4

What happened

I am running Airflow 2.2.4 on Kubernetes, using the KubernetesExecutor. If I re-create the scheduler pod, it attempts to adopt running job pods but fails to do so:

[2022-05-10 11:35:23,707] {kubernetes_executor.py:714} INFO - Failed to adopt pod <REDACTED>. Reason: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ac152b63-74ef-48c2-b4eb-fe5fbc808a56', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '251e299d-3b5d-4c7a-a3d9-46f17a316c93', 'X-Kubernetes-Pf-Prioritylevel-Uid': '66005fda-3f10-4344-8170-8c819dbbf59f', 'Date': 'Tue, 10 May 2022 11:35:23 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"<REDACTED>\" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations) <TRUNCATED>

The pods are then killed by the scheduler.

I tried to dig into the code as best I could, and I found that this might be caused by the KubernetesExecutor trying to update the pod’s metadata.labels here - but I could be wrong as I’m not very familiar with this part of Airflow.

What you think should happen instead

The scheduler should be able to adopt running pods instead of killing them.

How to reproduce

  • Run Airflow with the KubernetesExecutor
  • Start a long-running task
  • Re-create the scheduler pod

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

This happens when the scheduler pod is re-created while a job pod is running.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
olivermeyercommented, Jun 13, 2022

@potiuk happy to report that upgrading to 2.3.1 fixed the issue for us. Thank you!

0reactions
potiukcommented, Jun 14, 2022

@potiuk happy to report that upgrading to 2.3.1 fixed the issue for us. Thank you!

Glad it helped 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow with Kubernetes Executor unable to adopt and remove ...
All the pods are running in the airflow-build Kubernetes namespace. ... The scheduler tries to adopt the completed Worker pod.
Read more >
How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >
Use the KubernetesPodOperator | Cloud Composer
You can pass secrets to the Kubernetes pods by using the KubernetesPodOperator . Secrets must be defined in Kubernetes, or the pod fails...
Read more >
Use the cluster autoscaler in Azure Kubernetes Service (AKS)
The cluster autoscaler may be unable to scale down if pods can't move, such as in the following situations: A pod is directly...
Read more >
Perform a Rolling Update on a DaemonSet - Kubernetes
Some nodes run out of resources ... The rollout is stuck because new DaemonSet pods can't be scheduled on at least one node....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found