question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

grid.coarsen function

See original GitHub issue

Another feature that I think will be feasible and useful in the near future (when metrics e.g. #81 are available) is a function to coarsen a full dataset, which is aware of the grid metrics and how to coarsen them.

As an example, lets say we have ocean model output at 1/2 degree resolution and want to coarsen it to 2 deg, e.g. 4 grid points in each dimension.

I have previously used my own function to coarsen DataArrays which contain dask.arrays ( from this module)

def aggregate(da, blocks, func=np.nanmean, debug=False):
    """
    Performs efficient block averaging in one or multiple dimensions.
    Only works on regular grid dimensions.
    Parameters
    ----------
    da : xarray DataArray (must be a dask array!)
    blocks : list
        List of tuples containing the dimension and interval to aggregate over
    func : function
        Aggregation function.Defaults to numpy.nanmean
    Returns
    -------
    da_agg : xarray Data
        Aggregated array
    Examples
    --------
    >>> from xarrayutils import aggregate
    >>> import numpy as np
    >>> import xarray as xr
    >>> import matplotlib.pyplot as plt
    >>> %matplotlib inline
    >>> import dask.array as da
    >>> x = np.arange(-10,10)
    >>> y = np.arange(-10,10)
    >>> xx,yy = np.meshgrid(x,y)
    >>> z = xx**2-yy**2
    >>> a = xr.DataArray(da.from_array(z, chunks=(20, 20)),
                         coords={'x':x,'y':y}, dims=['y','x'])
    >>> print a
    <xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 20, x: 20)>
    dask.array<array-7..., shape=(20, 20), dtype=int64, chunksize=(20, 20)>
    Coordinates:
      * y        (y) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
      * x        (x) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
    >>> blocks = [('x',2),('y',5)]
    >>> a_coarse = aggregate(a,blocks,func=np.mean)
    >>> print a_coarse
    <xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 2, x: 10)>
    dask.array<coarsen..., shape=(2, 10), dtype=float64, chunksize=(2, 10)>
    Coordinates:
      * y        (y) int64 -10 0
      * x        (x) int64 -10 -8 -6 -4 -2 0 2 4 6 8
    Attributes:
        Coarsened with: <function mean at 0x111754230>
        Coarsenblocks: [('x', 2), ('y', 10)]
    """
    # Check if the input is a dask array (I might want to convert this
    # automaticlaly in the future)
    if not isinstance(da.data, Array):
        raise RuntimeError('data array data must be a dask array')
    # Check data type of blocks
    # TODO write test
    if (not all(isinstance(n[0], str) for n in blocks) or
            not all(isinstance(n[1], int) for n in blocks)):

        print('blocks input', str(blocks))
        raise RuntimeError("block dimension must be dtype(str), \
        e.g. ('lon',4)")

    # Check if the given array has the dimension specified in blocks
    try:
        block_dict = dict((da.get_axis_num(x), y) for x, y in blocks)
    except ValueError:
        raise RuntimeError("'blocks' contains non matching dimension")

    # Check the size of the excess in each aggregated axis
    blocks = [(a[0], a[1], da.shape[da.get_axis_num(a[0])] % a[1])
              for a in blocks]

    # for now default to trimming the excess
    da_coarse = coarsen(func, da.data, block_dict, trim_excess=True)

    # for now default to only the dims
    new_coords = dict([])
    # for cc in da.coords.keys():
    warnings.warn("WARNING: only dimensions are carried over as coordinates")
    for cc in list(da.dims):
        new_coords[cc] = da.coords[cc]
        for dd in blocks:
            if dd[0] in list(da.coords[cc].dims):
                new_coords[cc] = \
                    new_coords[cc].isel(
                        **{dd[0]: slice(0, -(1 + dd[2]), dd[1])})

    attrs = {'Coarsened with': str(func), 'Coarsenblocks': str(blocks)}
    da_coarse = xr.DataArray(da_coarse, dims=da.dims, coords=new_coords,
                             name=da.name, attrs=attrs)
    return da_coarse

This works nicely on single arrays, but it is cumbersome to do it for all the coordinates, since different coordinates require different functions to be applied.

E.g.

  • Tracer cell positions need to be averaged
  • Tracer cell boundaries need to be the min/max depending on the orientation
  • Tracer cell area and volume need to be summed.

With the logic from #81 we could assign these operations based on the defined metrics attributes, so that one might be able to call

coarse = grid.coarsen(ds, func='mean', {'X':4,'Y':4})

which would give a fully consistent dataset, that can be fed into xgcm.Grid again to work seamlessly with a downsampled version of the dataset.

@rabernat, as always I would love to hear if you think this is within the scope of xgcm. And as before this might not be an immediate priority but I wanted to put some of the thoughts out here, so that other people could join the discussion.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
jbuseckecommented, Jun 11, 2021

I am still very keen on having this, and marked it to stay open.

2reactions
jbuseckecommented, Oct 1, 2019

Since @roxyboy revived this on slack, I wanted to quickly update what I envision here. Hopefully others can chime in.

I would like to be able to eventually do this:

ds = ... # dataset with metrics, data on various grid points
grid = Grid(ds)
# Calculate the gradient on the native grid
temp_gradx = grid.diff(ds['temp'], 'X') / ds['dx']

# This step should coarsen all the datapoints and metrics (e.g. max for right boundary, min for left boundary, sum for area, distance etc) and put them on the right grid position to create a new grid object from that new dataset

ds_coarse = grid.coarsen(ds, {'X':5, 'Y':10})
grid_coarse = Grid(ds_coarse, metrics={....},...)

# Calculate the gradient on the coarser grid
temp_gradx_coarse = grid_coarse.diff(ds_coarse['temp'], 'X') / ds_coarse['dx']

We almost have all the machinery to do this, since xgcm is now aware of grid metrics and xarray already supports coarsening along dimensions out of the box (https://xarray.pydata.org/en/stable/generated/xarray.DataArray.coarsen.html).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Grid Coarsening - MRST
In MRST, a coarse grid always refers to a grid that is defined as a partition of another grid, which is referred to...
Read more >
Coarse Grid - an overview
When a coarse grid is used, wall functions are used for imposing boundary conditions near the walls (Section 3.2.4.1). The nondimensional wall distance ......
Read more >
COARSENING in TWO DIMENSIONS
We describe basic idea of coarsening algorithm for two dimensional bisection grids. For details, we refere to our recent paper: L. Chen and...
Read more >
Coarsening of a spatial grid
The function coarsenGrid is a function that samples from a SpatialGridDataFrame . The argument coarse indicates that every coarse row and column will...
Read more >
Grid Coarsening (Chapter 14) - An Introduction to ...
This chapter discusses how to partition a fine grid model into a smaller set of coarse blocks. After the partition, the coarse blocks...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found