Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regridding API design

See original GitHub issue

I was looking at Pangeo’s Regridding Design Document, which suggests an interface like da.remap.remap_like(da_target, how='bilinear'). It looks clean but doesn’t match the “two-step” procedure for most regridding algorithms.

Regridding weight calculation and weight application should be done separately, as reviewed by #2. The weights only depend on the source and target grids, not the input data. As long as the grids are not changed, users only need to calculate the weights once and can apply them to any data. Applying weights is often orders of magnitude faster than calculating weights (see the timing in #6 as an example), so separating the two steps has a huge impact on performance, especially when the task is to regrid a lot of data between a fixed pair of grids.

xESMF’s current design that dr_out = xe.regrid(ds_in, ds_out, dr_in) basically re-computes the weights every time. Using two steps should boost the performance by 10~100x.

Thus I am thinking about sklearn-like API, which is also two-step:

In sklearn you train a model by model.fit(x_train, y_train)
then make new predictions by y_pred = model.predict(x_test)

xESMF can do similar things:

Calculate regridding weights by weights = xe.compute_weights(ds_in, ds_out, method='bilinear') where ds_in and ds_out are xarray DataSet containing input and output grid information.
then apply weights to data by dr_out = weights.apply(dr_in) where dr_in is the input DataArray
Because ESMPy writes the weights to a file, the next time you can read it from file instead of computing it again. weights = xe.read_weights("weights.nc") The IO time is negligible compared to re-computing the weights.

Here weights is a tiny class that holds the weights and knows how to apply them to data. Alternatively, weights can be simply an xarray DataSet read by raw_weights = xr.open_dataset("weights.nc"). Then step 2 is changed to dr_out = xe.apply_weights(raw_weights, dr_in). I prefer the first approach because it feels more like sklearn and people might feel it more familiar.

Any comments are welcome, otherwise I’ll proceed this way. @rabernat @jhamman @spencerahill. Please also @ anyone who are interested in regridding with xarray.

Issue Analytics

State:
Created 6 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

darothencommented, Nov 9, 2017

I think having a step that can compute and cache the weights is a great idea, and very much like the sklearn-style approach. There are a few standard formats that I’m aware of for caching weights - in particular, SCRIP uses a format that both NCL and CDO are able to take in turn.

Now, if everything is done in the framework of xgcm, and there’s some notion of “standard” grids for a set of known models, then it may be possible to create a data server/archive which caches the weights for well-known re-gridding operations. Think about how Cartopy has the nifty utility to grab shapefiles from NaturalEarth… what if as part of pangeo-data there was a bucket on EC2 or GCP that catalogued and archived these re-gridding weights and could be directly downloaded by a client? The whole catalogue could be entirely automated, or could even receive requests and create the re-gridding weights on a VM that AWS or GCP spins up if the archive receives a request for an unknown re-gridding operation.

1reaction

rabernatcommented, Nov 9, 2017

Another issue with da.remap.remap_like(da_target, how=‘bilinear’) is passing cell boundaries for conservative regridding (pydata/xarray#1475)

On this topic, we are discussing how to best implement cell distance / area / volume data (generically called “grid metrics”) in xgcm/xgcm#81. One possibility is that xgcm will take care of those geometric questions. In this case, a regridding package could map between xgcm grids, rather than xarray datasets.

Top Results From Across the Web

Parcel API and Tileserver - Nationwide Coverage - Regrid

Present GIS property line data on a map and look up the freshest parcel details using API - make maps with our Vector...

Regridding - Pangeo Data

Regridding Design Document ... This package will provide geospatial regridding/remapping functionality on xarray ... Application programing interfaces (API).

Regridding Status - Earth System Modeling Framework

Offline regridding means that the weights are generated by a separate application from the user code. Offline weight generation is provided by the ......

Regrid | Esri Partner

Regrid is the leading provider of land parcels and location context data for ... Data Conversion/Migration, Data Model & Database Design, GIS Strategy...

3-D MODELING AND AUTOMATIC REGRIDDING IN SHAPE ...

A design velocity field can be considered as a perturbation of design variable ... design process, regridding of interior grid points can be...