question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DockerOperator tries to push bytes to XCom, fails to serialize

See original GitHub issue

Apache Airflow version

6f8c204

Environment

OS (e.g. from /etc/os-release): Mac OS 11.3 Kernel: Darwin Kernel Version 20.4.0 Install tools: pip install -e .

The DAG

@dag
def hello_docker():
    DockerOperator(
        task_id="say_hi",
        image="bash:latest",
        command=[
            "-c",
            "echo Hello World",
        ],
    )

What Happened

Operator tried to push to XCom, but throwse serialize error

[2021-05-19 09:13:40,044] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=docker_mount
AIRFLOW_CTX_TASK_ID=add_one
AIRFLOW_CTX_EXECUTION_DATE=2021-05-19T00:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=backfill__2021-05-19T00:00:00+00:00
[2021-05-19 09:13:40,904] {xcom.py:228} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your airflow config.
[2021-05-19 09:13:40,904] {taskinstance.py:1481} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1137, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1344, in _execute_task
    self.xcom_push(key=XCOM_RETURN_KEY, value=result)
  File "/Users/matt/src/airflow/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1919, in xcom_push
    XCom.set(
  File "/Users/matt/src/airflow/airflow/utils/session.py", line 67, in wrapper
    return func(*args, **kwargs)
  File "/Users/matt/src/airflow/airflow/models/xcom.py", line 79, in set
    value = XCom.serialize_value(value)
  File "/Users/matt/src/airflow/airflow/models/xcom.py", line 226, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
[2021-05-19 09:13:40,906] {taskinstance.py:1524} INFO - Marking task as FAILED. dag_id=docker_mount, task_id=add_one, execution_date=20210519T000000, start_date=20210519T145759, end_date=20210519T151340
[2021-05-19 09:13:40,913] {debug_executor.py:87} ERROR - Failed to execute task: Object of type bytes is not JSON serializable.

What I Expected to Happen

Something useful ended up in XCom, and no errors

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
potiukcommented, Jul 18, 2021

I think this is somewhat ambiguous whether the APIs of docker return bytes or strings. Apparently it has been a problem before as if you look few lines up, the conditional decode happens (and that’s why we have line.encode(“utf-8”) that should stay there.

            for line in lines:
                line = line.strip()
                if hasattr(line, 'decode'):
                    # Note that lines returned can also be byte sequences so we have to handle decode here
                    line = line.decode('utf-8')
                res_lines.append(line)

Not nice.

I looked at the APIs https://docker-py.readthedocs.io/en/1.2.3/api/ and it is mentioned in both attach and logs that it will return either str or generator. It does not mention generator of what type (and we are using stream=Yes in attach so I guess we get the generator not str). So well, it’s kinda not-specified whether the generator returns bytes or strings and it’s “OKeyish” to try to react to both cases.

Interestingly, the documentation says that attach method is really a wrapper around logs() - and when you look closer at the method seems what we are trying to do is to retrieve the logs that we already retrieved (and stored in res_lines as array of str). So it is likely that we actually receive again the same generator when we call logs after earlier calling “attach” with “stream=True”.

So I think the proper fix is actually:

ret = res_lines if self.xcom_all else line

However I wonder maybe there was a good reason why the line was encoded to bytes before pushing to Xcom? I think if we fix it, we should make a major release of docker operator.

1reaction
EtsuNDmAcommented, Jul 16, 2021

https://github.com/apache/airflow/blob/bc004151ed6924ee7bec5d9d047aedb4873806da/airflow/providers/docker/operators/docker.py#L316 I think we should cast self.cli.logs() to string, rather than encoding the line:

ret = self.cli.logs(container=self.container['Id']).decode('utf-8') if self.xcom_all else line
Read more comments on GitHub >

github_iconTop Results From Across the Web

DockerOperator tries to push bytes to XCom, fails to serialize
MatrixManAtYrService opened a new issue #15952: URL: https://github.com/apache/airflow/issues/15952. Apache Airflow version 6f8c204 ...
Read more >
Could not serialize the XCom value into JSON - Stack Overflow
Hi All so my dag actully runs fine, all the outputs are working but airflow's UI does not change to succes ...
Read more >
Source code for airflow.providers.docker.operators.docker
Useful for cases where users want a pickle serialized output that is not posted to logs :param retrieve_output_path: path for output file that...
Read more >
Release Notes - Apache Airflow documentation - Amazon AWS
Note that JSON serialization is stricter than pickling, so for example if you want to pass raw bytes through XCom you must encode...
Read more >
Airflow Documentation - Read the Docs
In the above example, Airflow will try to use S3Hook('MyS3Conn'). ... When a task pushes an XCom, it makes it generally available to....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found