Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to adopt pod: Scheduler cannot adopt running pods

See original GitHub issue

Apache Airflow version

2.2.4

What happened

I am running Airflow 2.2.4 on Kubernetes, using the KubernetesExecutor. If I re-create the scheduler pod, it attempts to adopt running job pods but fails to do so:

[2022-05-10 11:35:23,707] {kubernetes_executor.py:714} INFO - Failed to adopt pod <REDACTED>. Reason: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ac152b63-74ef-48c2-b4eb-fe5fbc808a56', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '251e299d-3b5d-4c7a-a3d9-46f17a316c93', 'X-Kubernetes-Pf-Prioritylevel-Uid': '66005fda-3f10-4344-8170-8c819dbbf59f', 'Date': 'Tue, 10 May 2022 11:35:23 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"<REDACTED>\" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations) <TRUNCATED>

The pods are then killed by the scheduler.

I tried to dig into the code as best I could, and I found that this might be caused by the KubernetesExecutor trying to update the pod’s metadata.labels here - but I could be wrong as I’m not very familiar with this part of Airflow.

What you think should happen instead

The scheduler should be able to adopt running pods instead of killing them.

How to reproduce

Run Airflow with the KubernetesExecutor
Start a long-running task
Re-create the scheduler pod

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

This happens when the scheduler pod is re-created while a job pod is running.

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project’s Code of Conduct

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

olivermeyercommented, Jun 13, 2022

@potiuk happy to report that upgrading to 2.3.1 fixed the issue for us. Thank you!

0reactions

potiukcommented, Jun 14, 2022

@potiuk happy to report that upgrading to 2.3.1 fixed the issue for us. Thank you!

Glad it helped 😃

Read more comments on GitHub >

Top Results From Across the Web

Airflow with Kubernetes Executor unable to adopt and remove ...

All the pods are running in the airflow-build Kubernetes namespace. ... The scheduler tries to adopt the completed Worker pod.

How to Debug Kubernetes Pending Pods and Scheduling ...

Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.

Use the KubernetesPodOperator | Cloud Composer

You can pass secrets to the Kubernetes pods by using the KubernetesPodOperator . Secrets must be defined in Kubernetes, or the pod fails...

Use the cluster autoscaler in Azure Kubernetes Service (AKS)

The cluster autoscaler may be unable to scale down if pods can't move, such as in the following situations: A pod is directly...

Perform a Rolling Update on a DaemonSet - Kubernetes

Some nodes run out of resources ... The rollout is stuck because new DaemonSet pods can't be scheduled on at least one node....

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Airflow DAGs not refreshed with pullPolicy set to Always with the same container tag

Status of testing of Apache Airflow 2.3.1rc1