question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Reverse" groupby method for split/apply/combine

See original GitHub issue

When dealing with high-dimensional data, algorithms often involve operations or aggregation on a particular dimension only, whilst keeping all other dimensions in the dataset.

For example, I might know that I want to average all data along the time axis, and I’m indifferent to the other dimensions present, i.e. I want my algorithm to work whenever there is a time axis, and to be indifferent to the presence/lack of any other dimensions.

Mapping this kind of implementation to xarray is awkward though because I can only use groupby() for the split/apply/combine operation.

For example, in xarray I have to do this:

averages = dataarray.groupby([dimensions excluding time dimension]).apply(my_method_that_works_on_time_dimension)

instead of this (where aggregate_over() is my “reverse” groupby method):

averages = dataarray.aggregate_over([time_dimension]).apply(my_method_that_works_on_time_dimension)

For the first example I have to do some extra work: I have to write additional code to fetch all the dimensions in the array, remove the time dimension from that list, and then use that list with groupby, in order to make my code depend on the time dimension only.

It would be really helpful to add a aggregate_over() method (name TBD of course!) as an alternative to groupby() that automates this extra work.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
shoyercommented, Apr 18, 2016

I agree, this would be great! See https://github.com/pydata/xarray/issues/324 for more discussion – I proposed calling this group_over but it’s essentially the same idea.

It should be relatively straightforward to implement once finish #818, but it will also require support for multiple groupby arguments, beyond just support for multi-dimensional arguments.

0reactions
dcheriancommented, Oct 4, 2020

Closing as dupe of #324 (group_over)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Group by: split-apply-combine — pandas 1.5.2 documentation
By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based...
Read more >
How to use the Split-Apply-Combine strategy in Pandas groupby
You can apply groupby method to a flat table with a simple 1D index column. That doesn't perform any operations on the table...
Read more >
Groupby, split-apply-combine and pandas - DataCamp
Step 1: split the data into groups by creating a groupby object from the original DataFrame; · Step 2: apply a function, in...
Read more >
The Split-Apply-Combine Strategy for Data Analysis
Here the split-apply- combine strategy maps closely to built-in R functions: We split with split(), apply with lapply() and then combine the ...
Read more >
How to use split-apply-combine pattern of pandas groupby() to ...
use the split-apply-combine paradigm · normalize within groups, using aggregate statistics of subgroups · use different normalizations (e.g. ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found