Shape preserving `diff` via new keywords
See original GitHub issueCurrently, an operation such as ds.diff('x') will result in a smaller size dimension, e.g.,
In [1]: import xarray as xr
In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})
In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 1 2 3
Data variables:
foo (x) int64 1 2 3
In [4]: ds.diff('x')
Out[4]:
<xarray.Dataset>
Dimensions: (x: 2)
Coordinates:
* x (x) int64 2 3
Data variables:
foo (x) int64 1 1
However, there are cases where the same size would be beneficial to keep so that you would get
In [1]: import xarray as xr
In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})
In [3]: ds.diff('x', preserve_shape=True, empty_value=0)
Out[3]:
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 1 2 3
Data variables:
foo (x) int64 0 1 1
Is there interest in addition of a preserve_shape=True keyword such that it results in this shape-preserving behavior? I’m proposing you could use this with label='upper' and label='lower'.
empty_value could be a value or empty_index could be an index for the fill value. If empty_value=None and empty_index=None, it would produce a nan.
The reason I’m asking the community is because this is at least the second time I’ve encountered an application where this behavior would be helpful, e.g., computing ocean layer thicknesses from bottom depths. A previous application was computation of a time step from time slice output and the desire to use this product in an approximated integral, e.g.,
y*diff(t, label='lower', preserve_shape=True)
where y and t are both of size n, which is effectively a left-sided Riemann sum.
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (10 by maintainers)

Top Related StackOverflow Question
As I mentioned in #1288, I think basic functionality like
integrateandgradientis totally within appropriate scope for xarray.I recall now that people have requested similar functionality for
numpy.diff: https://github.com/numpy/numpy/issues/8132. It would be nice to resolve this upstream in NumPy first (e.g., with https://github.com/numpy/numpy/pull/8206), and then simply copy the API design in xarray.Certainly grid-aware differencing and integral operators are preferred when the grid information is known and available, but I’m not sure that therefore a more naive version akin to np.gradient would not be useful. It’s quite likely that there are xarray users (e.g. in non climate/weather/ocean-related fields) wherein a ‘c’ grid is meaningless to them, yet they still would appreciate being able to easily compute derivatives via xarray operations.
But then we’re back to the valid questions raised before re: what is the appropriate scope of xarray functionality, c.f. https://github.com/pydata/xarray/issues/1288#issuecomment-283062107 and subsequent in that thread