question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel regridding support

See original GitHub issue

“Parallel regridding” could mean two things: (see #2 for more about this two-step regridding)

  1. Generate regridding weights in parallel.
  2. Apply regridding weights in parallel.

The native parallel support in ESMF/ESMPy is based on MPI and horizontal domain decomposition. It works for both generating and applying weights. See https://github.com/nawendt/esmpy-tutorial/blob/master/esmpy_mpi_example.py as an example.

MPI-based horizontal domain decomposition makes perfect sense for earth system model simulation, but for data analysis I would absolutely want to avoid MPI’s complexity. With dask, there’s a simple way to go:

  1. Avoid horizontal domain decomposition and only parallelize over additional dimensions like time and level. Writing such code withdask.array will be trivial.
  2. Still generate regridding weights in serial. My impression is people tend to have a fixed pair of source and destination grids and regrid a lot of data fields between them. In this case, weights generation only needs to be done once. So we would only care about applying the weights in parallel.

Is there any case that we have to parallelize over horizontal dimensions?

PS: Need to profile the regridding on very large data sets and figure out the bottleneck (generating vs applying weights) before starting to implement anything.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:47 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Jan 1, 2018

The current version (v0.1.1) only runs in serial, although extra-dimension-parallelism should be quite easy to add (literally just a parallel sparse .dot()) May add an experimental parallel method in the next release.

I think that XArray will handle things for you if you use methods like apply_ufunc

Thanks for the suggestion. I haven’t needed to regrid any data that is large enough and requires a distributed cluster. If there’s any specific research needs I am willing to try.

We’re certainly dealing with multi-terabyte datasets for which distributed computation isn’t necessary, but is certainly convenient. I’ll be giving a couple of talks about distributed XArray for data analysis in the next couple of weeks. It would be nice to point to this work as an example of an active community developing functionality with serial computation in mind that “just works” without having to think much about distributed computation. I believe that XArray’s apply_ufunc was designed with this kind of thing in mind.

0reactions
JiaweiZhuangcommented, Aug 6, 2019

v0.2 now supports parallel regridding with dask.

Distributed regridding is left to pangeo-data/pangeo#334

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regridding Status - Earth System Modeling Framework
This page describes the current support and testing status of the capabilities of the parallel regridding functionality provided by ESMF.
Read more >
xESMF: Universal Regridder for Geospatial Data — xESMF ...
Powerful: It uses ESMF/ESMPy as backend and can regrid between general curvilinear grids ... and also supports dask for out-of-core, parallel computation.
Read more >
Regridding E3SM Data with ncremap - Spaces - Confluence
Some of the simpler regridding options supported by ncclimo are also described ... that multiple-file invocations of ncremap execute in parallel by default....
Read more >
3 The remapping (or interpolation or regridding) - Cerfacs
With src, the mapping will be done in parallel on the source processors before ... See “Support of vector fields with the SCRIPR...
Read more >
MATLAB interp2 - Interpolation - MathWorks
Complex Number Support: Yes ... Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool .
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found