Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom

See original GitHub issue

Description

In order to create an XCom value with a BashOperator or a DockerOperator, we can use the option do_xcom_push that pushes to XCom the last line of the command logs.

It would be interesting to provide an option xcom_json to deserialize this last log line in case it’s a JSON string, before sending it as XCom. This would allow to access its attributes later in other tasks with the xcom_pull() method.

Use case/motivation

See my StackOverflow post : https://stackoverflow.com/questions/74083466/how-to-deserialize-xcom-strings-in-airflow

Consider a DAG containing two tasks: DAG: Task A >> Task B (BashOperators or DockerOperators). They need to communicate through XComs.

Task A outputs the informations through a one-line json in stdout, which can then be retrieve in the logs of Task A, and so in its return_value XCom key if xcom_push=True. For instance : {"key1":1,"key2":3}
Task B only needs the key2 information from Task A, so we need to deserialize the return_value XCom of Task A to extract only this value and pass it directly to Task B, using the jinja template {{xcom_pull('task_a')['key2']}}. Using it as this results in jinja2.exceptions.UndefinedError: 'str object' has no attribute 'key2' because return_value is just a string.

For example we can deserialize Airflow Variables in jinja templates (ex: {{ var.json.my_var.path }}). Globally I would like to do the same thing with XComs.

Current workaround:

We can create a custom Operator (inherited from BashOperator or DockerOperator) and augment the execute method:

execute the original execute method
intercepts the last log line of the task
tries to json.loads() it in a Python dictionnary
finally return the output (which is now a dictionnary, not a string)

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

class BashOperatorExtended(BashOperator):
    def execute(self, context):
        output = BashOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

class DockerOperatorExtended(DockerOperator):
    def execute(self, context):
        output = DockerOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

But creating a new operator just for that purpose is not really satisfying…

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project’s Code of Conduct

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

uranusjrcommented, Nov 10, 2022

If the goal is to make Jinja2 templating simpler (there’s no issue if it’s taskflow), the simplest way may be to add a built-in macro for this?

{{ json_loads(xcom_pull('task_a'))['key2'] }}

1reaction

potiukcommented, Oct 24, 2022

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

Actually I think that could be made into a common “AbstractOperator” feature when I think of it. We could add “deserialize_output” parameter so that any operator can use it. I think we should even deserialize it using yaml, because then we will automatically handle both Yaml, and JSON (Yamlk is actually a 100% compatible superset of JSON - every proper JSON content is also a valid YAML).

WDYT @uranusjr ? I think having it as common “operator” feature (disabled by default) is quite a powerful feature that can maje a number of existing operators much easier to work witth.