question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pickling issues when using classes and object oriented Python

See original GitHub issue

… or at least I suspect this is the problem. Forgive me if this is more of a Dask distributed issue and not one necessarily tied to dask-kubernetes, but as I’m running into this problem using the latter, I thought I’d post here.

At any rate, this is related to https://github.com/pangeo-data/storage-benchmarks for the Pangeo project. We’re using Airspeed Velocity for this which is object oriented. I’ve set up the tests so that storage setup/teardown are a bunch of classes and the benchmarks themselves are another set.

For example, I have a synthetic write test that instantiates a Zarr storage object that runs a write test:

class IOWrite_Zarr():
    timeout = 300
    #number = 1
    warmup_time = 0.0
    params = (['POSIX', 'GCS', 'FUSE'])
    param_names = ['backend']

    def setup(self, backend):
        chunksize=(10, 100, 100)
        self.da = da.random.normal(10, 0.1, size=(100, 100, 100),
                                   chunks=chunksize)
        self.da_size = np.round(self.da.nbytes / 1024**2, 2)
        self.target = target_zarr.ZarrStore(backend=backend, dask=True,
                                            chunksize=chunksize, shape=self.da.shape,
                                            dtype=self.da.dtype)
        self.target.get_temp_filepath()

        if backend == 'GCS':
            gsutil_arg = "gs://%s" % self.target.gcs_zarr
            call(["gsutil", "-q", "-m", "rm","-r", gsutil_arg])

    def time_synthetic_write(self, backend):
        self.da.store(self.target.storage_obj)

    def teardown(self, backend):
        self.target.rm_objects()

When I put code anywhere in there to start up my dask pods,

from dask_kubernetes import KubeCluster
cluster = KubeCluster.from_yaml('worker-spec.yml')
cluster.adapt()
from dask.distributed import Client
client = Client(cluster)

My benchmarks die a horrible death with pickle error messsages (error messages truncated for brevity):

                For parameters: 'GCS'
                Traceback (most recent call last):
                  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 38, in dumps
                    result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
                TypeError: can't pickle _thread.lock objects
...

                During handling of the above exception, another exception occurred:

                Traceback (most recent call last):
                  File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 795, in <module>
                    commands[mode](args)
                  File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 772, in main_run
                    result = benchmark.do_run()
                  File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 456, in do_run
                    return self.run(*self._current_params)
                  File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 548, in run
                    all_runs.extend(timer.repeat(repeat, number))
                  File "/opt/conda/lib/python3.6/timeit.py", line 206, in repeat
                    t = self.timeit(number)
                  File "/opt/conda/lib/python3.6/timeit.py", line 178, in timeit
                    timing = self.inner(it, self.timer)
                  File "<timeit-src>", line 6, in inner
                  File "/home/jovyan/.local/lib/python3.6/site-packages/asv/benchmark.py", line 512, in <lambda>
                    func = lambda: self.func(*param)
                  File "/home/jovyan/dev/storage-benchmarks-kai/benchmarks/IO_dask.py", line 57, in time_synthetic_write
                    self.da.store(self.target.storage_obj)
                  File "/opt/conda/lib/python3.6/site-packages/dask/array/core.py", line 1211, in store
                    r = store([self], [target], **kwargs)
                  File "/opt/conda/lib/python3.6/site-packages/dask/array/core.py", line 955, in store
                    result.compute(**kwargs)
                  File "/opt/conda/lib/python3.6/site-packages/dask/base.py", line 155, in compute
                    (result,) = compute(self, traverse=False, **kwargs)
                  File "/opt/conda/lib/python3.6/site-packages/dask/base.py", line 404, in compute
                    results = get(dsk, keys, **kwargs)
                  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 2064, in get
                    resources=resources)
                  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 2021, in _graph_to_futures
                    'tasks': valmap(dumps_task, dsk3),
                  File "cytoolz/dicttoolz.pyx", line 165, in cytoolz.dicttoolz.valmap
                  File "cytoolz/dicttoolz.pyx", line 190, in cytoolz.dicttoolz.valmap
                  File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 718, in dumps_task
                    'args': warn_dumps(task[1:])}
                  File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 727, in warn_dumps
                    b = dumps(obj)
                  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 51, in dumps
                    return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
                  File "/opt/conda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 881, in dumps
                    cp.dump(obj)
                  File "/opt/conda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 268, in dump
                    return Pickler.dump(self, obj)
                  File "/opt/conda/lib/python3.6/pickle.py", line 409, in dump
                    self.save(obj)
                  File "/opt/conda/lib/python3.6/pickle.py", line 476, in save
                    f(self, obj) # Call unbound method with explicit self
                  File "/opt/conda/lib/python3.6/pickle.py", line 751, in save_tuple
                    save(element)
                  File "/opt/conda/lib/python3.6/pickle.py", line 496, in save
                    rv = reduce(self.proto)
                TypeError: can't pickle _thread.lock objectsUsing mount point: /tmp/tmpi1hpqq5w

I’ve found a workaround by putting everything into a single callable def and that seems to work ok, however, it’ll lead to some messy and redundant code. I’m hoping there’s a straight-forward(ish) way to get classes to work with dask_kubernetes.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Apr 13, 2018

OK to close this then?

On Fri, Apr 13, 2018 at 5:36 AM, Kai Pak notifications@github.com wrote:

Does this only happen when using KubeCluster? I suspect that people within the pangeo or xarray issue trackers are more likely to know more about using zarr and locking

Yeah, this was happening with KubeCluster. After reading the docs a bit more, I realized that,

dask.set_options(get=dask.threaded.get)

Is not what I needed and the code works without it. Pretty sweet seeing tens of gigabytes being written in seconds! I’ll open up a another thread over at Pangeo as I’ve run into a couple other issues with when writing large (~>10 GB) datasets.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-kubernetes/issues/68#issuecomment-381096375, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJn73g5PP3kPX7Z_uz9aYbQD0TdYks5toH-2gaJpZM4TP6_g .

1reaction
mrocklincommented, Apr 11, 2018

Hi @kaipak , thanks for the issue. I recommend one of two solutions to help track this down:

Create a minimal example

It would be useful to take your current example and remove as much as possible from it while still maintaining the exception. For example if you take away the entire class then, from what I understand, things work. How about if you take away some of the methods or attributes? Do things still break or do they work ok? I think that if you try taking away different parts from your example you may be able to find a particular piece that is causing problems.

More general thoughts on this topic in this blog: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

This would be my first recommendation. It’s also a good practice to get used to.

Use pdb and look through the pickle stack trace

Pickle is having trouble serializing a lock, this is not surprising because locks aren’t serializable (They won’t make sense when they get unserialized). So you could run this in ipython and then use the %debug magic to walk up the stack trace (using up) and print the object that is being pickled. What is holding onto the lock? What is holding onto that object? Eventually as you climb up the stack you might find some object that you recognize and can easily control.

Dask handles objects

Just to be clear, dask is perfectly happy to move around normal python objects as long as they are serializable with cloudpickle. In this case one of those objects has a thread lock, which stops it from being serializable.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pickle — Python object serialization — Python 3.11.1 ...
“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a...
Read more >
Python Pickle Module and OOP - Stack Overflow
Running the code posted below raises no pickle errors. But I still want to know why pickle fails and how to make it...
Read more >
Do Not Use Python Pickle Unless You Know All These Points
A pickle object can only be loaded using Python. Other languages may be enabled to do so but require 3rd party libraries to...
Read more >
The Python pickle Module: How to Persist Objects in Python
In this tutorial, you'll learn how you can use the Python pickle module to convert your objects into a stream of bytes that...
Read more >
Understanding Python pickling and how to use it securely
When a byte stream is unpickled, the pickle module creates an instance of the original object first and then populates the instance with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found