DockerOperator tries to push bytes to XCom, fails to serialize
See original GitHub issueApache Airflow version
6f8c204
Environment
OS (e.g. from /etc/os-release): Mac OS 11.3 Kernel: Darwin Kernel Version 20.4.0 Install tools: pip install -e .
The DAG
@dag
def hello_docker():
DockerOperator(
task_id="say_hi",
image="bash:latest",
command=[
"-c",
"echo Hello World",
],
)
What Happened
Operator tried to push to XCom, but throwse serialize error
[2021-05-19 09:13:40,044] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=docker_mount
AIRFLOW_CTX_TASK_ID=add_one
AIRFLOW_CTX_EXECUTION_DATE=2021-05-19T00:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=backfill__2021-05-19T00:00:00+00:00
[2021-05-19 09:13:40,904] {xcom.py:228} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your airflow config.
[2021-05-19 09:13:40,904] {taskinstance.py:1481} ERROR - Task failed with exception
Traceback (most recent call last):
File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1137, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1344, in _execute_task
self.xcom_push(key=XCOM_RETURN_KEY, value=result)
File "/Users/matt/src/airflow/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/Users/matt/src/airflow/airflow/models/taskinstance.py", line 1919, in xcom_push
XCom.set(
File "/Users/matt/src/airflow/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/Users/matt/src/airflow/airflow/models/xcom.py", line 79, in set
value = XCom.serialize_value(value)
File "/Users/matt/src/airflow/airflow/models/xcom.py", line 226, in serialize_value
return json.dumps(value).encode('UTF-8')
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
[2021-05-19 09:13:40,906] {taskinstance.py:1524} INFO - Marking task as FAILED. dag_id=docker_mount, task_id=add_one, execution_date=20210519T000000, start_date=20210519T145759, end_date=20210519T151340
[2021-05-19 09:13:40,913] {debug_executor.py:87} ERROR - Failed to execute task: Object of type bytes is not JSON serializable.
What I Expected to Happen
Something useful ended up in XCom, and no errors
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:8 (7 by maintainers)
Top Results From Across the Web
DockerOperator tries to push bytes to XCom, fails to serialize
MatrixManAtYrService opened a new issue #15952: URL: https://github.com/apache/airflow/issues/15952. Apache Airflow version 6f8c204 ...
Read more >Could not serialize the XCom value into JSON - Stack Overflow
Hi All so my dag actully runs fine, all the outputs are working but airflow's UI does not change to succes ...
Read more >Source code for airflow.providers.docker.operators.docker
Useful for cases where users want a pickle serialized output that is not posted to logs :param retrieve_output_path: path for output file that...
Read more >Release Notes - Apache Airflow documentation - Amazon AWS
Note that JSON serialization is stricter than pickling, so for example if you want to pass raw bytes through XCom you must encode...
Read more >Airflow Documentation - Read the Docs
In the above example, Airflow will try to use S3Hook('MyS3Conn'). ... When a task pushes an XCom, it makes it generally available to....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think this is somewhat ambiguous whether the APIs of docker return bytes or strings. Apparently it has been a problem before as if you look few lines up, the conditional decode happens (and that’s why we have line.encode(“utf-8”) that should stay there.
Not nice.
I looked at the APIs https://docker-py.readthedocs.io/en/1.2.3/api/ and it is mentioned in both
attach
andlogs
that it will return eitherstr
or generator. It does not mention generator of what type (and we are using stream=Yes in attach so I guess we get the generator not str). So well, it’s kinda not-specified whether the generator returns bytes or strings and it’s “OKeyish” to try to react to both cases.Interestingly, the documentation says that
attach
method is really a wrapper around logs() - and when you look closer at the method seems what we are trying to do is to retrieve the logs that we already retrieved (and stored inres_lines
as array ofstr
). So it is likely that we actually receive again the same generator when we call logs after earlier calling “attach” with “stream=True”.So I think the proper fix is actually:
However I wonder maybe there was a good reason why the line was encoded to bytes before pushing to Xcom? I think if we fix it, we should make a major release of docker operator.
https://github.com/apache/airflow/blob/bc004151ed6924ee7bec5d9d047aedb4873806da/airflow/providers/docker/operators/docker.py#L316 I think we should cast self.cli.logs() to string, rather than encoding the line: