Registration of custom (de)serializer is recognized by dask_loads, dask_dumps, but not when computing graph
See original GitHub issueI’ve created an object wrapper type to carry along metadata with otherwise typical data objects such as NumPy arrays. It seems that for whatever reason, after registering the associated (de)serializers, while the Distributed functions for serialization recognize this registration when called directly, during execution of the computational graph, these registrations are ignored, and Pickle is used instead.
Self-contained demonstration of issue:
Must be in two separate module files for whatever reason.
main.py
from imageobject import ImageObject
from distributed.protocol.serialize import dask_loads, dask_dumps
import numpy as np
from dask.distributed import Client, LocalCluster
import dask.bag as db
def get_imageobject(*args, **kwargs):
return ImageObject(np.zeros(6))
if __name__ == '__main__':
# #######################
# serialize manually
im = get_imageobject()
im_ser = dask_loads(*dask_dumps(im))
print(np.all(im == im_ser))
# #########################################
# serialize through Dask graph execution
cluster = LocalCluster()
with Client(cluster) as client:
im_ser = client.compute(
db.from_sequence([None, None, None]).map(get_imageobject),
sync=True
)
print(np.all(im == im_ser))
imageobject.py
import wrapt
import dill
from typing import Any, Dict, Tuple, List, Union
from distributed.protocol.serialize import dask_serialize, dask_deserialize
class ImageObject(wrapt.ObjectProxy):
def __init__(self, object_to_wrap: Any):
super().__init__(object_to_wrap)
def serialize(self):
print('ser')
obj_dict = dict(
dataobject_type=type(self),
object_to_wrap=self.__wrapped__,
)
return dill.dumps(obj_dict)
@classmethod
def deserialize(cls, serialized_dict: Union[bytes, str]):
print('deser')
obj_dict = dill.loads(serialized_dict)
dataobject_type = obj_dict['dataobject_type']
object_to_wrap = obj_dict['object_to_wrap']
return dataobject_type(object_to_wrap)
def dask_serialize_imageobject(dataobject: ImageObject) -> Tuple[Dict, List[bytes]]:
header = {}
frames = [dataobject.serialize()]
return header, frames
def dask_deserialize_imageobject(header: Dict, frames: List[bytes]) -> ImageObject:
return ImageObject.deserialize(frames[0])
dask_serialize.register(ImageObject)(dask_serialize_imageobject)
dask_deserialize.register(ImageObject)(dask_deserialize_imageobject)
Execution output:
/scratch/anaconda3/envs/beads/bin/python /data/dtk-pipeline/scripts/test_dask_serialization.py
ser
deser
True
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/core.py", line 132, in loads
value = _deserialize(head, fs, deserializers=deserializers)
File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 184, in deserialize
return loads(header, frames)
File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 81, in serialization_error_loads
raise TypeError(msg)
TypeError: Could not serialize object of type list.
Traceback (most recent call last):
File "/scratch/anaconda3/envs/beads/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 38, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
TypeError: can't pickle ImageObject objects
Which seems to just keep popping up in an endless loop.
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Spingfox not recognizing custom serializer when generating ...
My jhipster v2.23.1 app uses custom serializers and deserializers for JSON parsing which I register as a module in JacksonConfiguration .
Read more >Serialization — Dask.distributed 2022.12.1 documentation
Custom per-type serializers that come with Dask for the special serialization of important classes of data like Numpy arrays. You can choose which...
Read more >type handling of custom serializer and deserializer
Register a custom serializer and deserializer for Interface I. Serialize using: String json = ObjectMapper mapper.writerWithType(C.class).
Read more >Data Types and Serialization: Flink Advanced Tutorials
This article gives a detailed account of Flink serialization and focuses on ... When a custom data type is not recognized as a...
Read more >Serialization and Deserialization - WCF - Microsoft Learn
This may cause a problem with a non-WCF client when sending such data to a WCF service. Creating a DataContractSerializer Instance. Constructing ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@milesgranger I’ve been having the same problem, so after a bit of searching I believe this section in the docs implies you cannot define custom serialization for “computational tasks”:
It’s unclear to me in (3) whether the computation task refers to just
f
orf
and its argumentx
. If justf
, then what is responsible for serializingx
? Iff
andx
, how would one customize their serialization?Think this issue is related:
Will always raise the
ValueError
set inFoo
. I think I have followed the example here pretty closely. swapping between extending the dask family, or registering my own withregister_serialization_family
or not, or both, or different permutations of(de)serializers
usingdask
,msgpack
, etc.Thanks!