question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support parallel regridding with dask-distributed

See original GitHub issue

Previous issues have discussed supporting dask enabled parallel regridding (e.g. #3). This seems to be working for the threaded scheduler but not for the distributed scheduler. It seems like this should be doable at this point with some work to solve some serialization problems.

Current behavior

If you run the current dask regridding example in this repo’s binder setup with dask-distributed, you get a bunch of serialization errors:

[21] result = ds_out['air'].compute()  # actually applies regridding
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
    for key, value in data.items()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
    if type(value) is Serialize
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
    raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
distributed.comm.utils - ERROR - ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 29, in _to_frames
    msg, serializers=serializers, on_error=on_error, context=context
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
    for key, value in data.items()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
    if type(value) is Serialize
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
    raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/batched.py", line 93, in _background_send
    payload, serializers=self.serializers, on_error="raise"
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/tcp.py", line 227, in write
    context={"sender": self._local_addr, "recipient": self._peer_addr},
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 37, in to_frames
    res = yield offload(_to_frames)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/utils.py", line 1370, in offload
    return (yield _offload_executor.submit(fn, *args, **kwargs))
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 29, in _to_frames
    msg, serializers=serializers, on_error=on_error, context=context
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
    for key, value in data.items()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
    if type(value) is Serialize
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
    for obj in x
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
    raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')

From what I can tell, it seems like there is some object that dask is trying to serialize that can’t be pickled. Has anyone looked into this to diagnose why this is happening?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
JiaweiZhuangcommented, Oct 19, 2019

Aha, problem solved. Just set this before applying the regridder to data:

regridder._grid_in = None
regridder._grid_out = None

regridder._grid_in was to linked to ESMF objects that involve f2py and ctypes, and Dask was having trouble pickling it. In the next version I will make sure that the Regridder class does not refer to any ESMF objects.

0reactions
dgergelcommented, May 14, 2020

@JiaweiZhuang in your example notebook that you include above, what versions of xESMF and dask distributed are you using? I am still getting the ...cannot be pickled... error when I replicate your sample workflow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lazy evaluation on Dask arrays — xESMF 0.3.0 documentation
If you are unfamiliar with Dask, read Parallel computing with Dask in Xarray documentation first. ... Support of Dask.distributed is in roadmap.
Read more >
Efficiency — Dask.distributed 2022.12.1 documentation
Parallel computing done well is responsive and rewarding. However, several speed-bumps can get in the way. This section describes common ways to ensure ......
Read more >
Introduction — Parallel Programming in Climate and Weather
The dask library provides parallel versions of many operations available in numpy and pandas. It does this by breaking up an array into...
Read more >
Parallel Computing with Dask: A Step-by-Step Tutorial
Dask is a one-stop solution for larger-than-memory data sets, as it provides multicore and distributed parallel execution.
Read more >
Dask RESAMPLING (can't handle large files...) - Science
I tried using the pyresample method to do my resampling in a parallel fashion (since it is dask-enabled), but I watched my dask-distributed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found