Support parallel regridding with dask-distributed
See original GitHub issuePrevious issues have discussed supporting dask enabled parallel regridding (e.g. #3). This seems to be working for the threaded scheduler but not for the distributed scheduler. It seems like this should be doable at this point with some work to solve some serialization problems.
Current behavior
If you run the current dask regridding example in this repo’s binder setup with dask-distributed, you get a bunch of serialization errors:
[21] result = ds_out['air'].compute() # actually applies regridding
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
for key, value in data.items()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
if type(value) is Serialize
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
distributed.comm.utils - ERROR - ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 29, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
for key, value in data.items()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
if type(value) is Serialize
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/batched.py", line 93, in _background_send
payload, serializers=self.serializers, on_error="raise"
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/tcp.py", line 227, in write
context={"sender": self._local_addr, "recipient": self._peer_addr},
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 37, in to_frames
res = yield offload(_to_frames)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/utils.py", line 1370, in offload
return (yield _offload_executor.submit(fn, *args, **kwargs))
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/comm/utils.py", line 29, in _to_frames
msg, serializers=serializers, on_error=on_error, context=context
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 44, in dumps
for key, value in data.items()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 45, in <dictcomp>
if type(value) is Serialize
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in serialize
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 167, in <listcomp>
for obj in x
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 210, in serialize
raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type SubgraphCallable.', 'subgraph_callable')
From what I can tell, it seems like there is some object that dask is trying to serialize that can’t be pickled. Has anyone looked into this to diagnose why this is happening?
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Lazy evaluation on Dask arrays — xESMF 0.3.0 documentation
If you are unfamiliar with Dask, read Parallel computing with Dask in Xarray documentation first. ... Support of Dask.distributed is in roadmap.
Read more >Efficiency — Dask.distributed 2022.12.1 documentation
Parallel computing done well is responsive and rewarding. However, several speed-bumps can get in the way. This section describes common ways to ensure ......
Read more >Introduction — Parallel Programming in Climate and Weather
The dask library provides parallel versions of many operations available in numpy and pandas. It does this by breaking up an array into...
Read more >Parallel Computing with Dask: A Step-by-Step Tutorial
Dask is a one-stop solution for larger-than-memory data sets, as it provides multicore and distributed parallel execution.
Read more >Dask RESAMPLING (can't handle large files...) - Science
I tried using the pyresample method to do my resampling in a parallel fashion (since it is dask-enabled), but I watched my dask-distributed...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Aha, problem solved. Just set this before applying the regridder to data:
regridder._grid_in
was to linked to ESMF objects that involvef2py
andctypes
, and Dask was having trouble pickling it. In the next version I will make sure that theRegridder
class does not refer to any ESMF objects.@JiaweiZhuang in your example notebook that you include above, what versions of xESMF and dask distributed are you using? I am still getting the
...cannot be pickled...
error when I replicate your sample workflow.