question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom

See original GitHub issue

Description

In order to create an XCom value with a BashOperator or a DockerOperator, we can use the option do_xcom_push that pushes to XCom the last line of the command logs.

It would be interesting to provide an option xcom_json to deserialize this last log line in case it’s a JSON string, before sending it as XCom. This would allow to access its attributes later in other tasks with the xcom_pull() method.

Use case/motivation

See my StackOverflow post : https://stackoverflow.com/questions/74083466/how-to-deserialize-xcom-strings-in-airflow

Consider a DAG containing two tasks: DAG: Task A >> Task B (BashOperators or DockerOperators). They need to communicate through XComs.

  • Task A outputs the informations through a one-line json in stdout, which can then be retrieve in the logs of Task A, and so in its return_value XCom key if xcom_push=True. For instance : {"key1":1,"key2":3}

  • Task B only needs the key2 information from Task A, so we need to deserialize the return_value XCom of Task A to extract only this value and pass it directly to Task B, using the jinja template {{xcom_pull('task_a')['key2']}}. Using it as this results in jinja2.exceptions.UndefinedError: 'str object' has no attribute 'key2' because return_value is just a string.

For example we can deserialize Airflow Variables in jinja templates (ex: {{ var.json.my_var.path }}). Globally I would like to do the same thing with XComs.

Current workaround:

We can create a custom Operator (inherited from BashOperator or DockerOperator) and augment the execute method:

  1. execute the original execute method
  2. intercepts the last log line of the task
  3. tries to json.loads() it in a Python dictionnary
  4. finally return the output (which is now a dictionnary, not a string)

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

class BashOperatorExtended(BashOperator):
    def execute(self, context):
        output = BashOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

class DockerOperatorExtended(DockerOperator):
    def execute(self, context):
        output = DockerOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

But creating a new operator just for that purpose is not really satisfying…

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
uranusjrcommented, Nov 10, 2022

If the goal is to make Jinja2 templating simpler (there’s no issue if it’s taskflow), the simplest way may be to add a built-in macro for this?

{{ json_loads(xcom_pull('task_a'))['key2'] }}
1reaction
potiukcommented, Oct 24, 2022

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

Actually I think that could be made into a common “AbstractOperator” feature when I think of it. We could add “deserialize_output” parameter so that any operator can use it. I think we should even deserialize it using yaml, because then we will automatically handle both Yaml, and JSON (Yamlk is actually a 100% compatible superset of JSON - every proper JSON content is also a valid YAML).

WDYT @uranusjr ? I think having it as common “operator” feature (disabled by default) is quite a powerful feature that can maje a number of existing operators much easier to work witth.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to deserialize Xcom strings in Airflow? - Stack Overflow
Globally I would like to do the same thing with XComs. Edit: a workaround is to convert the json string into a python...
Read more >
Best Practices - Apache Airflow
If possible, use XCom to communicate small messages between tasks and a good way of ... or if you need to deserialize a...
Read more >
Release Notes - Apache Airflow documentation - Amazon AWS
Add option of sending DAG parser logs to stdout. ... Serialize pod_override to JSON before pickling executor_config (#24356). Fix pid check (#24636).
Read more >
Airflow XCOM : The Ultimate Guide - Marc Lamberti
At the end of this tutorial, you will have a solid knowledge of XComs and you will be ready to use them in...
Read more >
[Solved]-Airflow docker commands comunicate via xCom-docker
The default is False (last line). Setting xcom_all to True should push all the log lines to XCOM, which you can then parse...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found