Consistent naming for xarray's methods that apply functions
See original GitHub issueWe currently have two types of methods that take a function to apply to xarray objects:
pipe
(onDataArray
andDataset
): apply a function to this entire object (array.pipe(func)
->func(array)
)apply
(onDataset
andGroupBy
): apply a function to each labeled object in this object (e.g.,ds.apply(func)
->ds({k: func(v) for k, v in ds.data_vars.items()})
).
And one more method that we want to add but isn’t finalized yet – currently named apply_ufunc
:
- Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object
I’d like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130
One proposal: rename apply
to map
, and then use apply
only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add .apply
methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from apply_ufunc
to convert dim
arguments to axis
and not do automatic broadcasting.)
Issue Analytics
- State:
- Created 7 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Applying unvectorized functions with apply_ufunc - Xarray
This example will illustrate how to conveniently apply an unvectorized function func to xarray objects using apply_ufunc . func expects 1D numpy arrays...
Read more >Basic data structures of xarray - Towards Data Science
Instead of axis labels, xarray uses named dimensions, which makes it easy to select data and apply operations over dimensions.
Read more >Data Structures - xarray - Read the Docs
Dimensions provide names that xarray uses instead of the axis argument found in many numpy functions. Coordinates enable fast label based indexing and ......
Read more >Future of `DataArray.rename` · Issue #6704 · pydata/xarray
This was a successful approach for .drop . .rename_vars is a slightly odd name given there are only coords. But it's consistent with...
Read more >xarray.Dataset — xarray 0.8.2 documentation
To load data from a file or file-like object, use the open_dataset function. Parameters: data_vars : dict-like, optional. A mapping from variable names...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t think we should consider ourselves beholden to pandas’s bad names, but we should definitely try to preserve backwards compatibility and interpretability for users.
Going back to Python itself:
apply(func, args, kwargs)
(from Python 2.x) is equivalent tofunc(*args, **kwargs)
map()
maps a function over each element of an iterablefunctools.reduce()
applies a binary function repeatedly to convert an iterable into a single elementFor xarray, we need:
Currently, we call both (1) and (2)
apply()
, which is pretty confusing, and usereduce()
for (3) even though it could potentially be a special case of (1) with a bit of extra magic and is quite unlikefunctools.reduce
. In contrast, pandas calls both (1) and (2)apply()
(usingraw=True
/raw=False
to distinguish), and calls (3)aggregate
oragg
.So long term, it could make sense to rename the current
Dataset.apply()
/GroupBy.apply()
(case 2) to.map
, and also rename.reduce()
to the more generic.aggregate()
.That said, I’m trying to imagine what the transition process for switching to new behavior for
Dataset.apply
looks like. We already will re-add dimensions to the output from calling functions inapply()
, but at some point we have to a do a hard cut-off from passingDataArray
objects to the function inapply
to passing in a raw array.I suppose we could do this by adding a
raw
keyword-only argument to.apply()
:raw=False
(current default), we would raise a warning about changing behavior and would pass-onDataArray
objects to the applied function. Users would be encouraged to use.map()
instead.raw=True
(future default behavior), we would pass in raw numpy/dask arrays to the future function.dim
argument might only be supported withraw=True
.We would end up with an extra extraneous
raw
argument, which we could remove/deprecate at our leisure.Another option is to keep
apply
as-is for Dataset and GroupBy objects, but add a separateapply_raw
method for applying functions that act on “raw” arrays. This would be a little more similar to pandas’apply
withraw=True
.We could even do the
raw=True
keyword argument like pandas, but this is a little awkward because there are some additional arguments onapply_raw
that don’t make sense onapply
(e.g., arguments that specify that some dimensions should be dropped or added).