question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add trapz to DataArray for mathematical integration

See original GitHub issue

Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.

I would guess that in practice, many uses of mean() and sum() are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.

For approximating an integral, it seems to me that the trapezoidal rule (trapz() in numpy) should be preferred to sum() or mean() in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.

It would be very useful to have trapz() as a method of DataArrays, so one could write, e.g., for an average value, Tavg = T.trapz(dim='time') / totalTime. Currently, I would have to use numpy’s method and then rebuild the reduced-dimensional array myself:

TavgVal= np.trapz(T, T['time'], axis=0) / totalTime
Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')

It could even be useful to have a function like mean_trapz() that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., data.integrate(dim='x', method='simpson'). But trapz is probably good enough for many cases and a big improvement over mean, and trapz is very simple even for unequally spaced data. And trapz shouldn’t be much less efficient in principle, although in practice I find np.trapz() to be several times slower than np.mean().

Quick examples demonstrating sum/mean vs. trapz to convince you of the superiority of trapz:

x = np.linspace(0, 2, 200)
y = 1/3 * x**3
dx = x[1] - x[0]
integralRiemann =  dx * np.sum(y)  # 1.3467673375251465
integralTrapz = np.trapz(y, x)  # 1.3333670025167712
integralExact = 4/3  # 1.3333333333333333

This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.

x = np.linspace(0, 2*np.pi, 200)
y = cos(x)**2
meanRiemann = np.mean(y)  #  0.50249999999999995
meanTrapz = np.trapz(y, x) / (2*np.pi)  # 0.5
meanExact = 1/2  # 0.5

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:3
  • Comments:26 (17 by maintainers)

github_iconTop GitHub Comments

3reactions
lamortoncommented, Apr 13, 2017

If you give a mouse a cookie, he’ll ask for a glass of milk. There are a whole slew of Numpy/Scipy functions that would really benefit from using xarray to organize input/out. I’ve written wrappers for svd, fft, psd, gradient, and specgram, for starts. Perhaps a new package would be in order?

2reactions
marbericommented, May 19, 2017

+1 for integrate. I found this thread when having the same problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.DataArray.integrate
Integrate along the given coordinate using the trapezoidal rule. Note. This feature is limited to simple cartesian geometry, i.e. coord must be one...
Read more >
MATLAB trapz - Trapezoidal numerical integration - MathWorks
This MATLAB function computes the approximate integral of Y via the trapezoidal method with unit spacing.
Read more >
Combining function integration with trapz integration
This is what I'm trying to do: Divide a numerically integrated array (over the whole data range) by a "quad" integrated function over...
Read more >
numpy.trapz — NumPy v1.24 Manual
Integrate along the given axis using the composite trapezoidal rule. If x is provided, the integration happens in sequence along its elements -...
Read more >
How to: Import, Plot, Fit, and Integrate Data in Python - YouTube
Learn how to import and visualize a ".csv" data set into Python. Also, how to do a linear least-squares curve fit to a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found