question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Registration of custom (de)serializer is recognized by dask_loads, dask_dumps, but not when computing graph

See original GitHub issue

I’ve created an object wrapper type to carry along metadata with otherwise typical data objects such as NumPy arrays. It seems that for whatever reason, after registering the associated (de)serializers, while the Distributed functions for serialization recognize this registration when called directly, during execution of the computational graph, these registrations are ignored, and Pickle is used instead.

Self-contained demonstration of issue:

Must be in two separate module files for whatever reason.

main.py

from imageobject import ImageObject
from distributed.protocol.serialize import dask_loads, dask_dumps

import numpy as np

from dask.distributed import Client, LocalCluster

import dask.bag as db


def get_imageobject(*args, **kwargs):
    return ImageObject(np.zeros(6))


if __name__ == '__main__':

    # #######################
    # serialize manually

    im = get_imageobject()

    im_ser = dask_loads(*dask_dumps(im))

    print(np.all(im == im_ser))

    # #########################################
    # serialize through Dask graph execution

    cluster = LocalCluster()

    with Client(cluster) as client:
        im_ser = client.compute(
            db.from_sequence([None, None, None]).map(get_imageobject),
            sync=True
        )

    print(np.all(im == im_ser))

imageobject.py

import wrapt
import dill

from typing import Any, Dict, Tuple, List, Union
from distributed.protocol.serialize import dask_serialize, dask_deserialize


class ImageObject(wrapt.ObjectProxy):
    def __init__(self, object_to_wrap: Any):
        super().__init__(object_to_wrap)

    def serialize(self):
        print('ser')

        obj_dict = dict(
            dataobject_type=type(self),
            object_to_wrap=self.__wrapped__,
        )

        return dill.dumps(obj_dict)

    @classmethod
    def deserialize(cls, serialized_dict: Union[bytes, str]):
        print('deser')

        obj_dict = dill.loads(serialized_dict)

        dataobject_type = obj_dict['dataobject_type']
        object_to_wrap = obj_dict['object_to_wrap']

        return dataobject_type(object_to_wrap)


def dask_serialize_imageobject(dataobject: ImageObject) -> Tuple[Dict, List[bytes]]:
    header = {}
    frames = [dataobject.serialize()]

    return header, frames


def dask_deserialize_imageobject(header: Dict, frames: List[bytes]) -> ImageObject:
    return ImageObject.deserialize(frames[0])


dask_serialize.register(ImageObject)(dask_serialize_imageobject)
dask_deserialize.register(ImageObject)(dask_deserialize_imageobject)

Execution output:

/scratch/anaconda3/envs/beads/bin/python /data/dtk-pipeline/scripts/test_dask_serialization.py
ser
deser
True
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/core.py", line 132, in loads
    value = _deserialize(head, fs, deserializers=deserializers)
  File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 184, in deserialize
    return loads(header, frames)
  File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 81, in serialization_error_loads
    raise TypeError(msg)
TypeError: Could not serialize object of type list.
Traceback (most recent call last):
  File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 38, in dumps
    result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
TypeError: can't pickle ImageObject objects

Which seems to just keep popping up in an endless loop.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
calebhocommented, May 20, 2019

@milesgranger I’ve been having the same problem, so after a bit of searching I believe this section in the docs implies you cannot define custom serialization for “computational tasks”:

There are three kinds of messages passed through the Dask network:

  1. Small administrative messages like “Worker A has finished task X” or “I’m running out of memory”. These are always serialized with msgpack.
  2. Movement of program data, such as Numpy arrays and Pandas dataframes. This uses a combination of pickle and custom serializers and is the topic of the next section
  3. Computational tasks like f(x) that are defined and serialized on client processes and deserialized and run on worker processes. These are serialized using a fixed scheme decided on by those libraries. Today this is a combination of pickle and cloudpickle.

It’s unclear to me in (3) whether the computation task refers to just f or f and its argument x. If just f, then what is responsible for serializing x? If f and x, how would one customize their serialization?

2reactions
milesgrangercommented, Jan 24, 2019

Think this issue is related:

        class Foo:
            """Some class which **cannot** be pickled"""
            def __init__(self, bar):
                self.bar = bar

            def __setstate__(self, state):
                raise ValueError('Seriously, I cannot be pickled!')

        @dask_serialize.register(Foo)
        def special_serializer(x, *args, **kwargs):
            # ... magic way of serializing Foo into List[bytes]
            return {'serializer': 'special_serde'}, serialized_foo

        @dask_deserialize.register(Foo)
        def special_deserializer(header, frames):
            # ... magic way of deserializing into Foo
            return deserialized_foo

        register_serialization_family('special_serde', special_serializer, special_deserializer)
        client = Client(serializers=['dask', 'special_serde'], deserializers=['dask', 'special_serde'], processes=False)

        @delayed
        def some_func(_foo):
            return 1 + 1

        val = some_func(Foo(2))
        val.compute()

Will always raise the ValueError set in Foo. I think I have followed the example here pretty closely. swapping between extending the dask family, or registering my own with register_serialization_family or not, or both, or different permutations of (de)serializers using dask, msgpack, etc.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spingfox not recognizing custom serializer when generating ...
My jhipster v2.23.1 app uses custom serializers and deserializers for JSON parsing which I register as a module in JacksonConfiguration .
Read more >
Serialization — Dask.distributed 2022.12.1 documentation
Custom per-type serializers that come with Dask for the special serialization of important classes of data like Numpy arrays. You can choose which...
Read more >
type handling of custom serializer and deserializer
Register a custom serializer and deserializer for Interface I. Serialize using: String json = ObjectMapper mapper.writerWithType(C.class).
Read more >
Data Types and Serialization: Flink Advanced Tutorials
This article gives a detailed account of Flink serialization and focuses on ... When a custom data type is not recognized as a...
Read more >
Serialization and Deserialization - WCF - Microsoft Learn
This may cause a problem with a non-WCF client when sending such data to a WCF service. Creating a DataContractSerializer Instance. Constructing ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found