"Reverse" groupby method for split/apply/combine
See original GitHub issueWhen dealing with high-dimensional data, algorithms often involve operations or aggregation on a particular dimension only, whilst keeping all other dimensions in the dataset.
For example, I might know that I want to average all data along the time axis, and I’m indifferent to the other dimensions present, i.e. I want my algorithm to work whenever there is a time axis, and to be indifferent to the presence/lack of any other dimensions.
Mapping this kind of implementation to xarray is awkward though because I can only use groupby()
for the split/apply/combine operation.
For example, in xarray I have to do this:
averages = dataarray.groupby([dimensions excluding time dimension]).apply(my_method_that_works_on_time_dimension)
instead of this (where aggregate_over()
is my “reverse” groupby method):
averages = dataarray.aggregate_over([time_dimension]).apply(my_method_that_works_on_time_dimension)
For the first example I have to do some extra work: I have to write additional code to fetch all the dimensions in the array, remove the time dimension from that list, and then use that list with groupby, in order to make my code depend on the time dimension only.
It would be really helpful to add a aggregate_over()
method (name TBD of course!) as an alternative to groupby()
that automates this extra work.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I agree, this would be great! See https://github.com/pydata/xarray/issues/324 for more discussion – I proposed calling this
group_over
but it’s essentially the same idea.It should be relatively straightforward to implement once finish #818, but it will also require support for multiple groupby arguments, beyond just support for multi-dimensional arguments.
Closing as dupe of #324 (group_over)