question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optional types with non-default serialization strategies do not load intermediates correctly

See original GitHub issue

When specifying a custom type (with a custom SerializationStrategy) as Optional in a solid’s input, it seems that the custom deserializer does not propagate all the way to the intermediate storage loader, causing the following error:

_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/api.py", line 663, in _pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/executor/in_process.py", line 36, in execute
    for event in inner_plan_execution_iterator(pipeline_context, execution_plan):
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 78, in inner_plan_execution_iterator
    _dagster_event_sequence_for_step(step_context, retries)
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 290, in _dagster_event_sequence_for_step
    raise unexpected_exception
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 214, in _dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 261, in core_dagster_event_sequence_for_step
    for input_name, input_value in _input_values_from_intermediate_storage(step_context):
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 479, in _input_values_from_intermediate_storage
    dagster_type=step_input.dagster_type,
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 147, in get_intermediate
    return self.get_intermediate_object(dagster_type, step_output_handle)
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 126, in get_intermediate_object
    key, serialization_strategy=dagster_type.serialization_strategy
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/object_store.py", line 143, in get_object
    obj = serialization_strategy.deserialize_from_file(key)
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/types/marshal.py", line 62, in deserialize_from_file
    return self.deserialize(read_obj)
  File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/types/marshal.py", line 73, in deserialize
    return pickle.load(read_file_obj)

I was able to recreate the UnpicklingError by trying to unpickle the intermediate file (that was properly serialized by my custom serialization strategy). Futhermore, removing the Optional wrapper from the solid input type causes this intermediate deserialization error to no longer fire.

Perhaps the serialization_strategy in

File "/home/ml/virtualenv/lib/python3.7/site-packages/dagster/core/storage/intermediate_storage.py", line 126, in get_intermediate_object
    key, serialization_strategy=dagster_type.serialization_strategy

is coming from Optional instead of the wrapped type? I use Optional with other custom types, but those use the default pickling serialization strategy so I haven’t noticed this error there.

Also: Thanks for making such an awesome platform for data engineering!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
trevenrawrcommented, Oct 14, 2020

Ahhh, I see what you mean by making the type itself optional; thanks! I’ll definitely use that pattern in the future if I encounter this case again.

No worries on the delay. Since originally posting about this unexpected behavior, I’ve used it as a sign of overcomplication and reworked the DAG a bit to clean up the inputs and better separate concerns. The new pattern has the unintended benefit of simply using solid selection syntax to control execution instead of relying on None-valued inputs, which feels better all around!

0reactions
sryzacommented, Oct 14, 2020

Got it - I’m going to close this out in that case. Please reach out if you run into more issues!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Frequently Asked Questions - XStream
These classes are loaded by reflection and only used if XStream is running ... Why do serialized types, fields or methods do not...
Read more >
Serialization in Swift - Discussion
Currently in the Codable API, when I want to have some customization around a single property, I have to rewrite ALL the auto-generated...
Read more >
C# serialization with JsonSchema and System.Text.Json
Learn how code generation can build on System.Text.Json and JSON Schema to create a great experience for C# developers.
Read more >
Python [Object Serialization Tutorial] Pickle Protocols
The 'pickle' module bundled with Python's standard library defines functions for serialization (dump() and dumps()) and deserialization (load() and loads()).
Read more >
Secure Coding Guidelines for Java SE - Oracle
This web page contains Secure Coding Guidelines for the Java Programming Language.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found