Add trapz to DataArray for mathematical integration
See original GitHub issueSince scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.
I would guess that in practice, many uses of mean()
and sum()
are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.
For approximating an integral, it seems to me that the trapezoidal rule (trapz()
in numpy) should be preferred to sum()
or mean()
in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.
It would be very useful to have trapz()
as a method of DataArrays, so one could write, e.g., for an average value, Tavg = T.trapz(dim='time') / totalTime
. Currently, I would have to use numpy’s method and then rebuild the reduced-dimensional array myself:
TavgVal= np.trapz(T, T['time'], axis=0) / totalTime
Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')
It could even be useful to have a function like mean_trapz()
that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., data.integrate(dim='x', method='simpson')
. But trapz
is probably good enough for many cases and a big improvement over mean
, and trapz
is very simple even for unequally spaced data. And trapz
shouldn’t be much less efficient in principle, although in practice I find np.trapz()
to be several times slower than np.mean()
.
Quick examples demonstrating sum
/mean
vs. trapz
to convince you of the superiority of trapz
:
x = np.linspace(0, 2, 200)
y = 1/3 * x**3
dx = x[1] - x[0]
integralRiemann = dx * np.sum(y) # 1.3467673375251465
integralTrapz = np.trapz(y, x) # 1.3333670025167712
integralExact = 4/3 # 1.3333333333333333
This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.
x = np.linspace(0, 2*np.pi, 200)
y = cos(x)**2
meanRiemann = np.mean(y) # 0.50249999999999995
meanTrapz = np.trapz(y, x) / (2*np.pi) # 0.5
meanExact = 1/2 # 0.5
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:26 (17 by maintainers)
Top GitHub Comments
If you give a mouse a cookie, he’ll ask for a glass of milk. There are a whole slew of Numpy/Scipy functions that would really benefit from using xarray to organize input/out. I’ve written wrappers for svd, fft, psd, gradient, and specgram, for starts. Perhaps a new package would be in order?
+1 for integrate. I found this thread when having the same problem.