Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Handle different grid coordinate formats and naming

See original GitHub issue

This is a meta-issue summarizing those frequently-asked questions:

Coordinate naming: #5 #13 #38 #73
Boundary formatting: https://github.com/JiaweiZhuang/xESMF/issues/14#issuecomment-369686779, https://github.com/JiaweiZhuang/xESMF/issues/32#issuecomment-418909576
Upstream: pydata/xarray#475

The problem

xESMF requires the input grid objects to contain variableslon/lat of shape (n_lat, n_lon), and optionally lon_b/lat_b of shape (n_lat+1, n_lon+1) for conservative regridding.

This leads to the naming problem that the original name might be latitude, 'lat_bnds', 'latitude_bnds', and the boundary formatting problem that the original boundary array might have shape (n_lat, n_lon, 4) instead of (n_lat+1, n_lon+1). The current fix is to rename the coordinate (#5) and reformatting the boundary (https://github.com/JiaweiZhuang/xESMF/issues/14#issuecomment-369686779)

The two problems often occur together: the CF-convention uses the name lat_bnds with a shape of (n_lat, n_lon, 4).

An upstream cause is that an xarray.Dataset has no notion of cell boundaries; other packages like xgcm tries to workaround that (#13). Xarray also does not force CF-convention.

Desired features for the solution

1. Unambiguous

xesmf should always be very clear and strict on the expected grid format. This prevents tricky edge cases and user confusion. What if the input dataset/dictionary contains all the three variables lat_b, 'latitude_b', 'latitude_bnd'? Which one is picked? Or should it throw a “duplication error”? (https://github.com/JiaweiZhuang/xESMF/pull/38#issuecomment-431536946)

There can be options to choose one of the names, but at a time there should only be one valid name, which can be printed out by something like xesmf.get_config().

The expected boundary array format also has to be explicit. There can be extra, preprocessing function to reformat the boundary, but such option has to be set explicitly so that users are aware of what they are doing.

2. Simple

This is a simple problem and needs a simple, intuitive solution. Although being annoying, such issue does not affect xesmf’s core functionality (the regridding computation). I am hesitate to put complex preprocessing logic or clever heuristics to guess the name and tweak the grid format. Complex code adds maintenance cost, and causes confusion for users who do not need such feature.

An alternative, simple “fix” to this issue, is to add some example notebooks showing how to preprocess various different grid formats, without complicating the package source code.

If changes are made in source code, there won’t be too many lines of new code and many if/else switches - those are an indication of complex logic.

Proposed solutions

1. Allow custom coordinate naming, by implementing xesmf.config.set(grid_name_dict=...) as global config or context manager.

(Originally proposed at https://github.com/JiaweiZhuang/xESMF/pull/38#issuecomment-432028269)

The grid_name_dict exactly follows xarray.Dataset.rename(name_dict). xesmf.config.set works similarly as xarray.set_options or dask.config.get. There should also be a xesmf.config.get('grid_name_dict') to print the current expected grid name.

Example usage:

import xesmf as xe
xe.config.get('grid_name_dict')  # prints {'lon': 'lon', 'lat': 'lat', 'lon_b': 'lon_b', 'lat': 'lat_b'}, which is the default

grid_name_dict = {'lon': 'longitude', 'lat': 'latitude', 
                  'lon_b': 'longitude_bnds', 'lat_b': 'latitude_bnds'}

with xe.config.set(grid_name_dict=grid_name_dict):  # set it regionally
    xe.Regridder(...)  # automatically use user-specified `grid_name_dict`
    xe.config.get('grid_name_dict')  # prints the modified config

xe.config.set(grid_name_dict=grid_name_dict)  # or set it globally
xe.Regridder(...)  # always uses the new global config

xe.config.refresh()  # back to default, similar to dask.config.refresh()

xesmf.config.set might also be used to set other general configurations, although I haven’t thought of an example. If there’re no other configurable parameters, can also just implement a single-purpose function xesmf.set_grid_name() / xesmf.get_grid_name().

2. Implement utility functions for reformatting cell boundaries

Change (n_lat, n_lon, 4) to (n_lat+1, n_lon+1), similar to OCGIS https://github.com/JiaweiZhuang/xESMF/issues/32#issuecomment-419496538

grid_bounds = xesmf.util.convert_corners(grid_bounds_with_4_corners)

Another very useful util is inferring boundaries from centers https://github.com/JiaweiZhuang/xESMF/issues/13#issuecomment-416947178:

grid_with_bounds = xesmf.util.infer_bounds(grid_without_bounds)

Optionally, those functions can be wrapped in the high-level API like xe.Regridder(..., boundary_format='4_corners') or xe.Regridder(..., boundary_format='inferred').

3. Simple support for CF-convention, built on step 1 and 2

Given the popularity of CF-convention, it makes sense to support such input data out-of-box (#38 #73). I emphasize “simple” because xesmf has no reason to check all CF-compliant attributes such as unit = 'degrees_east' or standard_name = 'latitude' – this is not the task for a regridding package.

For coordinate naming, can just set xesmf.config.set(grid_name_dict=xe.config.cf_grid_name), where

xe.config.cf_grid_name = {
    'lon': 'longitude', 'lat': 'latitude', 
    'lon_b': 'longitude_bnds', 'lat'_b: 'latitude_bnds'}

is pre-defined for convenience. Can also add more pre-defined dictionaries for other names like latitude_bounds, lat_bnds, or simply let users set their own.

The boundary formatting should explicitly go through step 2. Handling the boundary decoding automatically can often lead to corner cases and errors. For example what if the input grid is a 4-tile grid of shape (n_lat, n_lon, 4), but gets mis-interpreted as 4 corners? Should the Regridder throw an error when seeing 3-D grids, or check whether it is another representation?

Any comments & suggestions? PRs are particularly welcome, especially on the 'grid_name_dict' part.

Issue Analytics

State:
Created 4 years ago
Comments:10

Top GitHub Comments

2reactions

djhoesecommented, Apr 21, 2020

Thanks @jthielen. Another person I think should be kept in the loop in this discussion is @snowman2 who maintains pyproj and rioxarray. I bring these projects up because:

pyproj has the ability to convert CRS information to/from CF definitions. I recently switched some of my projects to depending on it for this.
rioxarray is a good example of using xarray accessors to deal with various headaches that this project may run in to. For example, defining which dimensions represent which geographic dimension (“x” and “y” versus “lon” and “lat” or any other odd naming that may exist in the wild).

The combination of these two projects has resulted in something that I’ve thought is much better than what I was attempting in geoxarray. @snowman2 creates a “spatial_ref” coordinate variable which itself has a crs_wkt attribute (the WKT version of the Coordinate Reference System). I think you can then also copy other coordinate variables like x/y to this spatial_ref variable. This has the benefit of holding on to CRS information and making it easy to access where xarray may have dropped .attrs or coordinate variables in other implementations.

Edit: I should have mentioned that I’d like geoxarray into something similar to rioxarray but not rasterio specific (no rasterio/gdal dependency).

1reaction

huardcommented, Apr 22, 2020

I agree something like cf-xarray would be useful. I’m concerned however that diving into this topic here could distract from the issue at hand.

Is there a place where all the potential use cases for cf-xarray could be listed, with links to existing code ? I think having this in one place would be helpful in designing an API for cf-xarray.