Optional types with non-default serialization strategies do not load intermediates correctly
See original GitHub issueWhen specifying a custom type (with a custom SerializationStrategy
) as Optional
in a solid’s input, it seems that the custom deserializer does not propagate all the way to the intermediate storage loader, causing the following error:
_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/api.py", line 663, in _pipeline_execution_iterator
for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/executor/in_process.py", line 36, in execute
for event in inner_plan_execution_iterator(pipeline_context, execution_plan):
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 78, in inner_plan_execution_iterator
_dagster_event_sequence_for_step(step_context, retries)
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 290, in _dagster_event_sequence_for_step
raise unexpected_exception
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 214, in _dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 261, in core_dagster_event_sequence_for_step
for input_name, input_value in _input_values_from_intermediate_storage(step_context):
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 479, in _input_values_from_intermediate_storage
dagster_type=step_input.dagster_type,
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 147, in get_intermediate
return self.get_intermediate_object(dagster_type, step_output_handle)
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 126, in get_intermediate_object
key, serialization_strategy=dagster_type.serialization_strategy
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/object_store.py", line 143, in get_object
obj = serialization_strategy.deserialize_from_file(key)
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/types/marshal.py", line 62, in deserialize_from_file
return self.deserialize(read_obj)
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/types/marshal.py", line 73, in deserialize
return pickle.load(read_file_obj)
I was able to recreate the UnpicklingError
by trying to unpickle the intermediate file (that was properly serialized by my custom serialization strategy). Futhermore, removing the Optional
wrapper from the solid input type causes this intermediate deserialization error to no longer fire.
Perhaps the serialization_strategy
in
File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 126, in get_intermediate_object
key, serialization_strategy=dagster_type.serialization_strategy
is coming from Optional
instead of the wrapped type? I use Optional
with other custom types, but those use the default pickling serialization strategy so I haven’t noticed this error there.
Also: Thanks for making such an awesome platform for data engineering!
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Ahhh, I see what you mean by making the type itself optional; thanks! I’ll definitely use that pattern in the future if I encounter this case again.
No worries on the delay. Since originally posting about this unexpected behavior, I’ve used it as a sign of overcomplication and reworked the DAG a bit to clean up the inputs and better separate concerns. The new pattern has the unintended benefit of simply using solid selection syntax to control execution instead of relying on None-valued inputs, which feels better all around!
Got it - I’m going to close this out in that case. Please reach out if you run into more issues!