grid.coarsen function
See original GitHub issueAnother feature that I think will be feasible and useful in the near future (when metrics e.g. #81 are available) is a function to coarsen a full dataset, which is aware of the grid metrics and how to coarsen them.
As an example, lets say we have ocean model output at 1/2 degree resolution and want to coarsen it to 2 deg, e.g. 4 grid points in each dimension.
I have previously used my own function to coarsen DataArrays which contain dask.arrays ( from this module)
def aggregate(da, blocks, func=np.nanmean, debug=False):
"""
Performs efficient block averaging in one or multiple dimensions.
Only works on regular grid dimensions.
Parameters
----------
da : xarray DataArray (must be a dask array!)
blocks : list
List of tuples containing the dimension and interval to aggregate over
func : function
Aggregation function.Defaults to numpy.nanmean
Returns
-------
da_agg : xarray Data
Aggregated array
Examples
--------
>>> from xarrayutils import aggregate
>>> import numpy as np
>>> import xarray as xr
>>> import matplotlib.pyplot as plt
>>> %matplotlib inline
>>> import dask.array as da
>>> x = np.arange(-10,10)
>>> y = np.arange(-10,10)
>>> xx,yy = np.meshgrid(x,y)
>>> z = xx**2-yy**2
>>> a = xr.DataArray(da.from_array(z, chunks=(20, 20)),
coords={'x':x,'y':y}, dims=['y','x'])
>>> print a
<xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 20, x: 20)>
dask.array<array-7..., shape=(20, 20), dtype=int64, chunksize=(20, 20)>
Coordinates:
* y (y) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
* x (x) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
>>> blocks = [('x',2),('y',5)]
>>> a_coarse = aggregate(a,blocks,func=np.mean)
>>> print a_coarse
<xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 2, x: 10)>
dask.array<coarsen..., shape=(2, 10), dtype=float64, chunksize=(2, 10)>
Coordinates:
* y (y) int64 -10 0
* x (x) int64 -10 -8 -6 -4 -2 0 2 4 6 8
Attributes:
Coarsened with: <function mean at 0x111754230>
Coarsenblocks: [('x', 2), ('y', 10)]
"""
# Check if the input is a dask array (I might want to convert this
# automaticlaly in the future)
if not isinstance(da.data, Array):
raise RuntimeError('data array data must be a dask array')
# Check data type of blocks
# TODO write test
if (not all(isinstance(n[0], str) for n in blocks) or
not all(isinstance(n[1], int) for n in blocks)):
print('blocks input', str(blocks))
raise RuntimeError("block dimension must be dtype(str), \
e.g. ('lon',4)")
# Check if the given array has the dimension specified in blocks
try:
block_dict = dict((da.get_axis_num(x), y) for x, y in blocks)
except ValueError:
raise RuntimeError("'blocks' contains non matching dimension")
# Check the size of the excess in each aggregated axis
blocks = [(a[0], a[1], da.shape[da.get_axis_num(a[0])] % a[1])
for a in blocks]
# for now default to trimming the excess
da_coarse = coarsen(func, da.data, block_dict, trim_excess=True)
# for now default to only the dims
new_coords = dict([])
# for cc in da.coords.keys():
warnings.warn("WARNING: only dimensions are carried over as coordinates")
for cc in list(da.dims):
new_coords[cc] = da.coords[cc]
for dd in blocks:
if dd[0] in list(da.coords[cc].dims):
new_coords[cc] = \
new_coords[cc].isel(
**{dd[0]: slice(0, -(1 + dd[2]), dd[1])})
attrs = {'Coarsened with': str(func), 'Coarsenblocks': str(blocks)}
da_coarse = xr.DataArray(da_coarse, dims=da.dims, coords=new_coords,
name=da.name, attrs=attrs)
return da_coarse
This works nicely on single arrays, but it is cumbersome to do it for all the coordinates, since different coordinates require different functions to be applied.
E.g.
- Tracer cell positions need to be averaged
- Tracer cell boundaries need to be the min/max depending on the orientation
- Tracer cell area and volume need to be summed.
With the logic from #81 we could assign these operations based on the defined metrics
attributes, so that one might be able to call
coarse = grid.coarsen(ds, func='mean', {'X':4,'Y':4})
which would give a fully consistent dataset, that can be fed into xgcm.Grid
again to work seamlessly with a downsampled version of the dataset.
@rabernat, as always I would love to hear if you think this is within the scope of xgcm. And as before this might not be an immediate priority but I wanted to put some of the thoughts out here, so that other people could join the discussion.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:12 (8 by maintainers)
Top GitHub Comments
I am still very keen on having this, and marked it to stay open.
Since @roxyboy revived this on slack, I wanted to quickly update what I envision here. Hopefully others can chime in.
I would like to be able to eventually do this:
We almost have all the machinery to do this, since xgcm is now aware of grid metrics and xarray already supports coarsening along dimensions out of the box (https://xarray.pydata.org/en/stable/generated/xarray.DataArray.coarsen.html).