question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can we add configable debug settings for delay pod delete when there is a `Error` state of pods ?

See original GitHub issue

Description

Add configable debug settings for delay pod delete when there is a Error state of pods.

Use case / motivation

In apache/airflow:1.10.10 image.

I’m deploy a airflow in k8s, want to use Kubernetes Executor for task excute. If the pod got Error state, airflow scheduler would delete pod immediately. So we can not see what happend, pod is deleted in some seconds.

When I add time.sleep() in kubernetes_executor.py:896 , like this:

    def _change_state(self, key, state, pod_id, namespace):
        if state != State.RUNNING:
            if self.kube_config.delete_worker_pods:
                for x in range(120):
                    self.log.info(str(x) + ": sleep 1s for...")
                    time.sleep(1)
                self.kube_scheduler.delete_pod(pod_id, namespace)
                self.log.info('Deleted pod: %s in namespace %s', str(key), str(namespace))
            try:
                self.running.pop(key)
            except KeyError:
                self.log.debug('Could not find key: %s', str(key))
        self.event_buffer[key] = state

When trigger execute manully, I can see pod got Error state soon.

➜  ~ kubectl get po
NAME                                                         READY   STATUS    RESTARTS   AGE
airflow-564c84ff46-tn5mg                                     2/2     Running   0          67s
examplebashoperatorrunme0-76fd68aa96d64e8c93c7c87904f3312a   0/1     Error     0          24s

Watch pod’s log:

➜  ~ kubectl logs -f examplebashoperatorrunme0-76fd68aa96d64e8c93c7c87904f3312a
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 23, in <module>
    import argcomplete
ModuleNotFoundError: No module named 'argcomplete'

It’s a error in container. It’s easy to debug now.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:6
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

12reactions
hcbrauncommented, Apr 29, 2020

AIRFLOW__KUBERNETES__RUN_AS_USER: “50000”

3reactions
hcbrauncommented, Apr 29, 2020

Hi gwind, how did you solve the container error? ModuleNotFoundError: No module named 'argcomplete' I have the same issue in pods with the Kubernetes executor and the example DAGs

There is an option to keep / not delete worker pods: AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: “false”

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >
Force Delete StatefulSet Pods - Kubernetes
This page shows how to delete Pods which are part of a stateful set, and explains the considerations to keep in mind when...
Read more >
Delaying Shutdown to Wait for Pod Deletion Propagation
When a pod is removed from the cluster via the API, all that is happening is that the pod is marked for deletion...
Read more >
Troubleshooting 'terminated with exit code 1' error - ContainIQ
Sometimes an “off and on again” approach can prove effective. Delete the pod completely, then add it back into your cluster. This can...
Read more >
Kubernetes CrashLoopBackOff: What it is, and how to fix it?
Kubernetes will wait an increasing back-off time between restarts to give you a chance to fix the error. As such, CrashLoopBackOff is not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found