Parallel regridding support
See original GitHub issue“Parallel regridding” could mean two things: (see #2 for more about this two-step regridding)
- Generate regridding weights in parallel.
- Apply regridding weights in parallel.
The native parallel support in ESMF/ESMPy is based on MPI and horizontal domain decomposition. It works for both generating and applying weights. See https://github.com/nawendt/esmpy-tutorial/blob/master/esmpy_mpi_example.py as an example.
MPI-based horizontal domain decomposition makes perfect sense for earth system model simulation, but for data analysis I would absolutely want to avoid MPI’s complexity. With dask, there’s a simple way to go:
- Avoid horizontal domain decomposition and only parallelize over additional dimensions like time and level. Writing such code with
dask.array
will be trivial. - Still generate regridding weights in serial. My impression is people tend to have a fixed pair of source and destination grids and regrid a lot of data fields between them. In this case, weights generation only needs to be done once. So we would only care about applying the weights in parallel.
Is there any case that we have to parallelize over horizontal dimensions?
PS: Need to profile the regridding on very large data sets and figure out the bottleneck (generating vs applying weights) before starting to implement anything.
Issue Analytics
- State:
- Created 6 years ago
- Comments:47 (17 by maintainers)
Top GitHub Comments
I think that XArray will handle things for you if you use methods like apply_ufunc
We’re certainly dealing with multi-terabyte datasets for which distributed computation isn’t necessary, but is certainly convenient. I’ll be giving a couple of talks about distributed XArray for data analysis in the next couple of weeks. It would be nice to point to this work as an example of an active community developing functionality with serial computation in mind that “just works” without having to think much about distributed computation. I believe that XArray’s
apply_ufunc
was designed with this kind of thing in mind.v0.2 now supports parallel regridding with dask.
Distributed regridding is left to pangeo-data/pangeo#334