can't pickle local object when doing a to_hdf5 when using dask.distributed
See original GitHub issueI’m playing around with the to_hdf5 command following some steps shown here: http://dask.pydata.org/en/latest/array-creation.html
When I try to save the dask array to hdf5 file I got the following error:
distributed.protocol.pickle - INFO - Failed to serialize (<function insert_to_ooc.<locals>.store at 0x7f72d2d1cc80>, (<function apply at 0x7f72f41b6840>, <function partial_by_order at 0x7f72d4787b70>, [(<function arange at 0x7f72e4044bf8>, 0, 3, 1, 3, dtype('int64'))], {'function': <built-in function pow>, 'other': [(1, 2)]}), (slice(0, 3, None),), <unlocked _thread.lock object at 0x7f72d2c25e40>, None)
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 41, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'insert_to_ooc.<locals>.store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 54, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 312, in save_function_tuple
save((code, closure, base_globals))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 604, in save_reduce
save(state)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle h5py.h5d.DatasetID objects
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 41, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'insert_to_ooc.<locals>.store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 47, in dumps
for key, value in data.items()
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 48, in <dictcomp>
if type(value) is Serialize}
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 130, in serialize
header, frames = {}, [pickle.dumps(x)]
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 54, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 312, in save_function_tuple
save((code, closure, base_globals))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 604, in save_reduce
save(state)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle h5py.h5d.DatasetID objects
distributed.comm.utils - INFO - Unserializable Message: [{'op': 'update-graph', 'tasks': {"('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": <Serialize: (<function insert_to_ooc.<locals>.store at 0x7f72d2d1cc80>, (<function apply at 0x7f72f41b6840>, <function partial_by_order at 0x7f72d4787b70>, [(<function arange at 0x7f72e4044bf8>, 0, 3, 1, 3, dtype('int64'))], {'function': <built-in function pow>, 'other': [(1, 2)]}), (slice(0, 3, None),), <unlocked _thread.lock object at 0x7f72d2c25e40>, None)>, "('store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": <Serialize: ('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 0)>, "('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": <Serialize: (<function insert_to_ooc.<locals>.store at 0x7f72d2d1cc80>, (<function apply at 0x7f72f41b6840>, <function partial_by_order at 0x7f72d4787b70>, [(<function arange at 0x7f72e4044bf8>, 3, 6, 1, 3, dtype('int64'))], {'function': <built-in function pow>, 'other': [(1, 2)]}), (slice(3, 6, None),), <unlocked _thread.lock object at 0x7f72d2c25e40>, None)>, "('store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": <Serialize: ('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 1)>}, 'dependencies': {"('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": [], "('store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": ["('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 0)"], "('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": [], "('store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": ["('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 1)"]}, 'keys': ["('store-pow-962eaabdc778ca6372a23fde28e7916c', 1)", "('store-pow-962eaabdc778ca6372a23fde28e7916c', 0)"], 'restrictions': {}, 'loose_restrictions': None, 'priority': {"('store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": 0, "('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 0)": 1, "('store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": 2, "('arange-pow-store-pow-962eaabdc778ca6372a23fde28e7916c', 1)": 3}, 'resources': None}]
distributed.comm.utils - ERROR - can't pickle h5py.h5d.DatasetID objects
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 41, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'insert_to_ooc.<locals>.store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/comm/utils.py", line 16, in to_frames
return list(protocol.dumps(msg))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 47, in dumps
for key, value in data.items()
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 48, in <dictcomp>
if type(value) is Serialize}
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 130, in serialize
header, frames = {}, [pickle.dumps(x)]
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 54, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 312, in save_function_tuple
save((code, closure, base_globals))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 604, in save_reduce
save(state)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle h5py.h5d.DatasetID objects
distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 41, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'insert_to_ooc.<locals>.store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/batched.py", line 85, in _background_send
nbytes = yield self.comm.write(payload)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/tornado/gen.py", line 270, in wrapper
result = func(*args, **kwargs)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/types.py", line 248, in wrapped
coro = func(*args, **kwargs)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/comm/tcp.py", line 171, in write
frames = [ensure_bytes(f) for f in to_frames(msg)]
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/comm/utils.py", line 16, in to_frames
return list(protocol.dumps(msg))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 47, in dumps
for key, value in data.items()
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/core.py", line 48, in <dictcomp>
if type(value) is Serialize}
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 130, in serialize
header, frames = {}, [pickle.dumps(x)]
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 54, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 312, in save_function_tuple
save((code, closure, base_globals))
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 604, in save_reduce
save(state)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/tangy/anaconda3/envs/ipykernel_py3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle h5py.h5d.DatasetID objects
Here is the code:
import dask.array as da
import numpy as np
import distributed;
client = distributed.Client()
x = da.arange(6,chunks=3)
y = x**2
np.array(y)
y.compute()
da.to_hdf5('myfile.hdf5', '/y', y)
I am running a conda environment with Python 3.6 installed. All schedulers, workers, and clients are running within this environment.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Dask distributed LocalCluster fails with "TypeError: can't pickle ...
Dask distributed LocalCluster fails with "TypeError: can't pickle _thread._local objects" when using dask.array.store to hdf5 file.
Read more >H5py objects cannot be pickled or slow processing - Dask Array
I have an hdf file which i cannot fit into memory which means it should be read in chunks, so I do this...
Read more >Reading and writing files - Xarray
Pickling is important because it doesn't require any external libraries and lets you use xarray objects with Python modules like multiprocessing ...
Read more >Reading and Writing Dask DataFrames and Arrays to HDF5
This blog post explains how to write Dask DataFrames to HDF5 files with to_hdf and how to write Dask Arrays to HDF5 files...
Read more >Working notes by Matthew Rocklin - SciPy
Typically we use libraries like pickle to serialize Python objects. For dask.frame we really care about doing this quickly so we're going to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Unfortunately, the issue still exists.
When I run the code
it works fine when distributed scheduler is disabled, but crashes when I enable it. Here is code to enable scheduler:
Here is the error raised by to_hdf5() method
The error is still there when I use
da.store()
instead ofto_hdf5()
as @mrocklin suggested:Here is the error message
I’m running the code in
daskdev/dask-notebook
Docker container. The environment:Is there a workaround for this problem?
I think I’m running into the same problem as well.