Implement polyfit?
See original GitHub issueFitting a line (or curve) to data along a specified axis is a long-standing need of xarray users. There are many blog posts and SO questions about how to do it:
- http://atedstone.github.io/rate-of-change-maps/
- https://gist.github.com/luke-gregor/4bb5c483b2d111e52413b260311fbe43
- https://stackoverflow.com/questions/38960903/applying-numpy-polyfit-to-xarray-dataset
- https://stackoverflow.com/questions/52094320/with-xarray-how-to-parallelize-1d-operations-on-a-multidimensional-dataset
- https://stackoverflow.com/questions/36275052/applying-a-function-along-an-axis-of-a-dask-array
The main use case in my domain is finding the temporal trend on a 3D variable (e.g. temperature in time, lon, lat).
Yes, you can do it with apply_ufunc, but apply_ufunc is inaccessibly complex for many users. Much of our existing API could be removed and replaced with apply_ufunc calls, but that doesn’t mean we should do it.
I am proposing we add a Dataarray method called polyfit
. It would work like this:
x_ = np.linspace(0, 1, 10)
y_ = np.arange(5)
a_ = np.cos(y_)
x = xr.DataArray(x_, dims=['x'], coords={'x': x_})
a = xr.DataArray(a_, dims=['y'])
f = a*x
p = f.polyfit(dim='x', deg=1)
# equivalent numpy code
p_ = np.polyfit(x_, f.values.transpose(), 1)
np.testing.assert_allclose(p_[0], a_)
Numpy’s polyfit function is already vectorized in the sense that it accepts 1D x and 2D y, performing the fit independently over each column of y. To extend this to ND, we would just need to reshape the data going in and out of the function. We do this already in other packages. For dask, we could simply require that the dimension over which the fit is calculated be contiguous, and then call map_blocks.
Thoughts?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:8
- Comments:25 (19 by maintainers)
Top GitHub Comments
From a user perspective, I think people prefer to find stuff in one place.
From a maintainer perspective, as long as it’s somewhat domain agnostic (e.g., “physical sciences” rather than “oceanography”) and written to a reasonable level of code quality, I think it’s fine to toss it into xarray. “Already exists in NumPy/SciPy” is probably a reasonable proxy for the former.
So I say: yes, let’s toss in polyfit, along with fast fourier transforms.
If we’re concerned about clutter, we can put stuff in a dedicated namespace, e.g.,
xarray.wrappers
.I pushed a new PR trying to implement polyfit in xarray, #3733. It is still work in progress, but I would like the opinion on those who participated in this thread.
Considering all options discussed in the thread, I chose an implementation that seemed to give the best performance and generality (skipping NaN values), but it duplicates a lot of code from
numpy.polyfit
.Main question:
A lot of extra code could be removed if we’d say we only want to compute and return the residuals and the coefficients. All the other variables are a few lines of code away for the user that really wants them, and they don’t need the power of xarray and dask anyway.
I’m guessing @huard @dcherian @rabernat and @shoyer might have comments.