question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KubernetesPodOperator does not return XCOM on pod failure

See original GitHub issue

Apache Airflow version: 1.10.9

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.14.9

Environment:

  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g. from /etc/os-release): Linux (debian 9.12 inside docker image)
  • Kernel (e.g. uname -a): 5.4.0 (on my host)
  • Install tools:
  • Others:

What happened:

I ran a new task using the KubernetesPodOperator on our k8s cluster. This pod is designed to write to the /airflow/xcom/return.json even in case of failures so we can send a user-friendly error message in a following task. The pod exits with a non-zero exit code, so Airflow appropriately updates the task as failed, but the XCOM values are not available.

What you expected to happen:

I expected XCOM variables to be available even on pod failure. We use this capability in other operators to signal error conditions and messages.

How to reproduce it:

Run a KubernetesPodOperator with a command like this in an alpine image.

/bin/bash -c 'echo "{'success': False}" > /airflow/xcom/return.json; exit 1'

Check the XCOM results, which should include the JSON dictionary.

Anything else we need to know:

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
dstandishcommented, Dec 8, 2021

Yeah I think we should go ahead and make this with no option – just make it push xcom in a finally or something. But have to check on timing. There’s a refactor of KPO in progress and may make sense to include the change as part of that.

1reaction
jvsteincommented, Sep 22, 2020

@Shivarp1 - I have not tested in Airflow 1.10.12. Reading through the relevant section on the 1.10.12 tag, I suspect the same issue exists.

I just noticed that my repro steps had a bug in the command. It should have been exit 1 not echo 1 at the end. I updated the description.

We’re currently using 1.10.9, with the following patch.

diff --git c/airflow/contrib/operators/kubernetes_pod_operator.py i/airflow/contrib/operators/kubernetes_pod_operator.py
index f692599d7..f4b970d7e 100644
--- c/airflow/contrib/operators/kubernetes_pod_operator.py
+++ i/airflow/contrib/operators/kubernetes_pod_operator.py
@@ -20,6 +20,7 @@ import warnings

 from airflow.exceptions import AirflowException
 from airflow.models import BaseOperator
+from airflow.models import XCOM_RETURN_KEY
 from airflow.utils.decorators import apply_defaults
 from airflow.contrib.kubernetes import kube_client, pod_generator, pod_launcher
 from airflow.contrib.kubernetes.pod import Resources
@@ -253,12 +254,13 @@ class KubernetesPodOperator(BaseOperator):  # pylint: disable=too-many-instance-
                 if self.is_delete_operator_pod:
                     launcher.delete_pod(pod)

+            if self.do_xcom_push:
+                self.xcom_push(context, XCOM_RETURN_KEY, result)
+
             if final_state != State.SUCCESS:
                 raise AirflowException(
                     'Pod returned a failure: {state}'.format(state=final_state)
                 )
-            if self.do_xcom_push:
-                return result
         except AirflowException as ex:
             raise AirflowException('Pod Launching failed: {error}'.format(error=ex))
Read more comments on GitHub >

github_iconTop Results From Across the Web

Failed to extract xcom from airflow pod - Stack Overflow
Hey! I'm trying to run my code as in your example, but Kubernetes operator (GKEPodOperator in my case) does not return any value...
Read more >
[GitHub] [airflow] jvstein commented on issue ... - The Mail Archive
[GitHub] [airflow] jvstein commented on issue #8792: KubernetesPodOperator does not return XCOM on pod failure · 2020-09-25 Thread GitBox.
Read more >
A closer look at Airflow's KubernetesPodOperator and XCom
An Airflow task instance described by the KubernetesPodOperator can write a dict to the file /airflow/xcom/return.json (always the same ...
Read more >
KubernetesPodOperator — apache-airflow-providers-cncf ...
This will create a sidecar container that runs alongside the Pod. The Pod must write the XCom value into this location at the...
Read more >
Use the KubernetesPodOperator | Astronomer Documentation
The full DAG code is provided in the following example. To avoid task failure, turn on do_xcom_push after you create the airflow/xcom/return.json within...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found