Pickling issues when using classes and object oriented Python
See original GitHub issue… or at least I suspect this is the problem. Forgive me if this is more of a Dask distributed issue and not one necessarily tied to dask-kubernetes, but as I’m running into this problem using the latter, I thought I’d post here.
At any rate, this is related to https://github.com/pangeo-data/storage-benchmarks for the Pangeo project. We’re using Airspeed Velocity for this which is object oriented. I’ve set up the tests so that storage setup/teardown are a bunch of classes and the benchmarks themselves are another set.
For example, I have a synthetic write test that instantiates a Zarr storage object that runs a write test:
class IOWrite_Zarr():
timeout = 300
#number = 1
warmup_time = 0.0
params = (['POSIX', 'GCS', 'FUSE'])
param_names = ['backend']
def setup(self, backend):
chunksize=(10, 100, 100)
self.da = da.random.normal(10, 0.1, size=(100, 100, 100),
chunks=chunksize)
self.da_size = np.round(self.da.nbytes / 1024**2, 2)
self.target = target_zarr.ZarrStore(backend=backend, dask=True,
chunksize=chunksize, shape=self.da.shape,
dtype=self.da.dtype)
self.target.get_temp_filepath()
if backend == 'GCS':
gsutil_arg = "gs://%s" % self.target.gcs_zarr
call(["gsutil", "-q", "-m", "rm","-r", gsutil_arg])
def time_synthetic_write(self, backend):
self.da.store(self.target.storage_obj)
def teardown(self, backend):
self.target.rm_objects()
When I put code anywhere in there to start up my dask pods,
from dask_kubernetes import KubeCluster
cluster = KubeCluster.from_yaml('worker-spec.yml')
cluster.adapt()
from dask.distributed import Client
client = Client(cluster)
My benchmarks die a horrible death with pickle error messsages (error messages truncated for brevity):
For parameters: 'GCS'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 38, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
TypeError: can't pickle _thread.lock objects
...
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 795, in <module>
commands[mode](args)
File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 772, in main_run
result = benchmark.do_run()
File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 456, in do_run
return self.run(*self._current_params)
File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 548, in run
all_runs.extend(timer.repeat(repeat, number))
File "/opt/conda/lib/python3.6/timeit.py", line 206, in repeat
t = self.timeit(number)
File "/opt/conda/lib/python3.6/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 512, in <lambda>
func = lambda: self.func(*param)
File "/home/jovyan/dev/storage-benchmarks-kai/benchmarks/IO_dask.py", line 57, in time_synthetic_write
self.da.store(self.target.storage_obj)
File "/opt/conda/lib/python3.6/site-packages/dask/array/core.py", line 1211, in store
r = store([self], [target], **kwargs)
File "/opt/conda/lib/python3.6/site-packages/dask/array/core.py", line 955, in store
result.compute(**kwargs)
File "/opt/conda/lib/python3.6/site-packages/dask/base.py", line 155, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/dask/base.py", line 404, in compute
results = get(dsk, keys, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 2064, in get
resources=resources)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 2021, in _graph_to_futures
'tasks': valmap(dumps_task, dsk3),
File "cytoolz/dicttoolz.pyx", line 165, in cytoolz.dicttoolz.valmap
File "cytoolz/dicttoolz.pyx", line 190, in cytoolz.dicttoolz.valmap
File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 718, in dumps_task
'args': warn_dumps(task[1:])}
File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 727, in warn_dumps
b = dumps(obj)
File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 51, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/opt/conda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 881, in dumps
cp.dump(obj)
File "/opt/conda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 268, in dump
return Pickler.dump(self, obj)
File "/opt/conda/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/opt/conda/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/conda/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/opt/conda/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle _thread.lock objectsUsing mount point: /tmp/tmpi1hpqq5w
I’ve found a workaround by putting everything into a single callable def and that seems to work ok, however, it’ll lead to some messy and redundant code. I’m hoping there’s a straight-forward(ish) way to get classes to work with dask_kubernetes.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
OK to close this then?
On Fri, Apr 13, 2018 at 5:36 AM, Kai Pak notifications@github.com wrote:
Hi @kaipak , thanks for the issue. I recommend one of two solutions to help track this down:
Create a minimal example
It would be useful to take your current example and remove as much as possible from it while still maintaining the exception. For example if you take away the entire class then, from what I understand, things work. How about if you take away some of the methods or attributes? Do things still break or do they work ok? I think that if you try taking away different parts from your example you may be able to find a particular piece that is causing problems.
More general thoughts on this topic in this blog: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
This would be my first recommendation. It’s also a good practice to get used to.
Use pdb and look through the pickle stack trace
Pickle is having trouble serializing a lock, this is not surprising because locks aren’t serializable (They won’t make sense when they get unserialized). So you could run this in ipython and then use the
%debug
magic to walk up the stack trace (usingup
) and print the object that is being pickled. What is holding onto the lock? What is holding onto that object? Eventually as you climb up the stack you might find some object that you recognize and can easily control.Dask handles objects
Just to be clear, dask is perfectly happy to move around normal python objects as long as they are serializable with cloudpickle. In this case one of those objects has a thread lock, which stops it from being serializable.