Handle different grid coordinate formats and naming
See original GitHub issueThis is a meta-issue summarizing those frequently-asked questions:
- Coordinate naming: #5 #13 #38 #73
- Boundary formatting: https://github.com/JiaweiZhuang/xESMF/issues/14#issuecomment-369686779, https://github.com/JiaweiZhuang/xESMF/issues/32#issuecomment-418909576
- Upstream: pydata/xarray#475
The problem
xESMF requires the input grid objects to contain variableslon
/lat
of shape (n_lat, n_lon)
, and optionally lon_b
/lat_b
of shape (n_lat+1, n_lon+1)
for conservative regridding.
This leads to the naming problem that the original name might be latitude
, 'lat_bnds'
, 'latitude_bnds'
, and the boundary formatting problem that the original boundary array might have shape (n_lat, n_lon, 4)
instead of (n_lat+1, n_lon+1)
. The current fix is to rename the coordinate (#5) and reformatting the boundary (https://github.com/JiaweiZhuang/xESMF/issues/14#issuecomment-369686779)
The two problems often occur together: the CF-convention uses the name lat_bnds
with a shape of (n_lat, n_lon, 4)
.
An upstream cause is that an xarray.Dataset
has no notion of cell boundaries; other packages like xgcm tries to workaround that (#13). Xarray also does not force CF-convention.
Desired features for the solution
1. Unambiguous
xesmf should always be very clear and strict on the expected grid format. This prevents tricky edge cases and user confusion. What if the input dataset/dictionary contains all the three variables lat_b
, 'latitude_b'
, 'latitude_bnd'
? Which one is picked? Or should it throw a “duplication error”? (https://github.com/JiaweiZhuang/xESMF/pull/38#issuecomment-431536946)
There can be options to choose one of the names, but at a time there should only be one valid name, which can be printed out by something like xesmf.get_config()
.
The expected boundary array format also has to be explicit. There can be extra, preprocessing function to reformat the boundary, but such option has to be set explicitly so that users are aware of what they are doing.
2. Simple
This is a simple problem and needs a simple, intuitive solution. Although being annoying, such issue does not affect xesmf’s core functionality (the regridding computation). I am hesitate to put complex preprocessing logic or clever heuristics to guess the name and tweak the grid format. Complex code adds maintenance cost, and causes confusion for users who do not need such feature.
An alternative, simple “fix” to this issue, is to add some example notebooks showing how to preprocess various different grid formats, without complicating the package source code.
If changes are made in source code, there won’t be too many lines of new code and many if
/else
switches - those are an indication of complex logic.
Proposed solutions
1. Allow custom coordinate naming, by implementing xesmf.config.set(grid_name_dict=...)
as global config or context manager.
(Originally proposed at https://github.com/JiaweiZhuang/xESMF/pull/38#issuecomment-432028269)
The grid_name_dict
exactly follows xarray.Dataset.rename(name_dict)
. xesmf.config.set
works similarly as xarray.set_options
or dask.config.get
. There should also be a xesmf.config.get('grid_name_dict')
to print the current expected grid name.
Example usage:
import xesmf as xe
xe.config.get('grid_name_dict') # prints {'lon': 'lon', 'lat': 'lat', 'lon_b': 'lon_b', 'lat': 'lat_b'}, which is the default
grid_name_dict = {'lon': 'longitude', 'lat': 'latitude',
'lon_b': 'longitude_bnds', 'lat_b': 'latitude_bnds'}
with xe.config.set(grid_name_dict=grid_name_dict): # set it regionally
xe.Regridder(...) # automatically use user-specified `grid_name_dict`
xe.config.get('grid_name_dict') # prints the modified config
xe.config.set(grid_name_dict=grid_name_dict) # or set it globally
xe.Regridder(...) # always uses the new global config
xe.config.refresh() # back to default, similar to dask.config.refresh()
xesmf.config.set
might also be used to set other general configurations, although I haven’t thought of an example. If there’re no other configurable parameters, can also just implement a single-purpose function xesmf.set_grid_name()
/ xesmf.get_grid_name()
.
2. Implement utility functions for reformatting cell boundaries
Change (n_lat, n_lon, 4)
to (n_lat+1, n_lon+1)
, similar to OCGIS https://github.com/JiaweiZhuang/xESMF/issues/32#issuecomment-419496538
grid_bounds = xesmf.util.convert_corners(grid_bounds_with_4_corners)
Another very useful util is inferring boundaries from centers https://github.com/JiaweiZhuang/xESMF/issues/13#issuecomment-416947178:
grid_with_bounds = xesmf.util.infer_bounds(grid_without_bounds)
Optionally, those functions can be wrapped in the high-level API like xe.Regridder(..., boundary_format='4_corners')
or xe.Regridder(..., boundary_format='inferred')
.
3. Simple support for CF-convention, built on step 1 and 2
Given the popularity of CF-convention, it makes sense to support such input data out-of-box (#38 #73). I emphasize “simple” because xesmf has no reason to check all CF-compliant attributes such as unit = 'degrees_east'
or standard_name = 'latitude'
– this is not the task for a regridding package.
For coordinate naming, can just set xesmf.config.set(grid_name_dict=xe.config.cf_grid_name)
, where
xe.config.cf_grid_name = {
'lon': 'longitude', 'lat': 'latitude',
'lon_b': 'longitude_bnds', 'lat'_b: 'latitude_bnds'}
is pre-defined for convenience. Can also add more pre-defined dictionaries for other names like latitude_bounds
, lat_bnds
, or simply let users set their own.
The boundary formatting should explicitly go through step 2. Handling the boundary decoding automatically can often lead to corner cases and errors. For example what if the input grid is a 4-tile grid of shape (n_lat, n_lon, 4), but gets mis-interpreted as 4 corners? Should the Regridder
throw an error when seeing 3-D grids, or check whether it is another representation?
Any comments & suggestions? PRs are particularly welcome, especially on the 'grid_name_dict'
part.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10
Top GitHub Comments
Thanks @jthielen. Another person I think should be kept in the loop in this discussion is @snowman2 who maintains pyproj and rioxarray. I bring these projects up because:
The combination of these two projects has resulted in something that I’ve thought is much better than what I was attempting in geoxarray. @snowman2 creates a “spatial_ref” coordinate variable which itself has a
crs_wkt
attribute (the WKT version of the Coordinate Reference System). I think you can then also copy other coordinate variables like x/y to thisspatial_ref
variable. This has the benefit of holding on to CRS information and making it easy to access where xarray may have dropped.attrs
or coordinate variables in other implementations.Edit: I should have mentioned that I’d like geoxarray into something similar to rioxarray but not rasterio specific (no rasterio/gdal dependency).
I agree something like cf-xarray would be useful. I’m concerned however that diving into this topic here could distract from the issue at hand.
Is there a place where all the potential use cases for cf-xarray could be listed, with links to existing code ? I think having this in one place would be helpful in designing an API for cf-xarray.