Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Reverse" groupby method for split/apply/combine

See original GitHub issue

When dealing with high-dimensional data, algorithms often involve operations or aggregation on a particular dimension only, whilst keeping all other dimensions in the dataset.

For example, I might know that I want to average all data along the time axis, and I’m indifferent to the other dimensions present, i.e. I want my algorithm to work whenever there is a time axis, and to be indifferent to the presence/lack of any other dimensions.

Mapping this kind of implementation to xarray is awkward though because I can only use groupby() for the split/apply/combine operation.

For example, in xarray I have to do this:

averages = dataarray.groupby([dimensions excluding time dimension]).apply(my_method_that_works_on_time_dimension)

instead of this (where aggregate_over() is my “reverse” groupby method):

averages = dataarray.aggregate_over([time_dimension]).apply(my_method_that_works_on_time_dimension)

For the first example I have to do some extra work: I have to write additional code to fetch all the dimensions in the array, remove the time dimension from that list, and then use that list with groupby, in order to make my code depend on the time dimension only.

It would be really helpful to add a aggregate_over() method (name TBD of course!) as an alternative to groupby() that automates this extra work.

Issue Analytics

State:
Created 7 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

shoyercommented, Apr 18, 2016

I agree, this would be great! See https://github.com/pydata/xarray/issues/324 for more discussion – I proposed calling this group_over but it’s essentially the same idea.

It should be relatively straightforward to implement once finish #818, but it will also require support for multiple groupby arguments, beyond just support for multi-dimensional arguments.

0reactions

dcheriancommented, Oct 4, 2020

Closing as dupe of #324 (group_over)