question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: DataFrameGroupBy column subset selection with single list?

See original GitHub issue

I wouldn’t be surprised if there is already an issue about this, but couldn’t directly find one.

When doing a subselection of columns on a DataFrameGroupBy object, both a plain list (so a tuple within the __getitem__ [] brackets) as the double square brackets (a list inside the __getitem__ [] brackets) seems to work:

In [6]: df = pd.DataFrame(np.random.randint(10, size=(10, 4)), columns=['a', 'b', 'c', 'd'])

In [8]: df.groupby('a').sum()
Out[8]: 
    b   c   d
a            
0   0   5   7
3  18   6  12
4  16   6   9
6  10  11  11
9   3   3   0

In [9]: df.groupby('a')['b', 'c'].sum()
Out[9]: 
    b   c
a        
0   0   5
3  18   6
4  16   6
6  10  11
9   3   3

In [10]: df.groupby('a')[['b', 'c']].sum()
Out[10]: 
    b   c
a        
0   0   5
3  18   6
4  16   6
6  10  11
9   3   3

Personally I find this df.groupby('a')['b', 'c'].sum() a bit strange, and inconsistent with how DataFrame indexing works.

Of course, on a DataFrameGroupBy you don’t have the possible confusion with indexing multiple dimensions (rows, columns), but still.

cc @jreback @WillAyd

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:18 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Dec 29, 2019

@yehoshuadimarsky we have 3000+ issues and constant comments - to be honest we barely have time to triage on the PRs

even really important things are not necessarily discussed at length

just like everyone else has limited time - the best way to prompt a discussion is to push a change

1reaction
yehoshuadimarskycommented, Dec 25, 2019

So this is my first time working on pandas code, and I’m a little confused here, so please bear with me. I’m also new to linking to code on GitHub.

As I understand, when an object calls __getitem__ by using brackets, if you pass in several keys, they are implicitly converted to a tuple of one key. So df['a','b'] is really df[('a','b')] under the hood.

I’m having trouble in tracing the code path to figure out where exactly the __getitem__ on the GroupBy is actually implemented here:

  1. DataFrame.groupby is called on the superclass NDFrame here
  2. This eventually creates the specific DataFrameGroupBy object here
  3. Which is a subclass of GroupBy
  4. Which is a subclass of _GroupBy
  5. Which has the mixin named SelectionMixin, defined here
  6. Which implements __getitem__ here
  7. Which, if the key is a list or tuple, returns self._gotitem(list(key), ndim=2)
  8. self._gotitem needs to be implemented by the respective subclasses, which in this case is the DataFrameGroupBy object, and is implemented here
  9. But all this does is simply create an instance of itself (DataFrameGroupBy) with the key (a list/tuple) passed as a slice to the selection parameter
  10. The selection parameter is implemented in the parent _GroupBy object, which sets the internal self._selection attribute to the key here
  11. This is where I’m lost. How does this actually slice the object and only return a subset of it?

Any help here would be greatly appreciated. Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Returning subset of each group from a pandas groupby object
So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the)...
Read more >
All Pandas groupby() You Should Know for Grouping Data ...
Pandas' groupby() allows us to split data into separate groups to ... To perform aggregation on a specific column ... Creating a subset...
Read more >
Pandas Group Rows into List Using groupby()
You can group DataFrame rows into a list by using pandas.DataFrame.groupby() function on the column of interest, select the column you want ...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
A Python function, to be called on each of the axis labels. A list or NumPy array of the same length as the...
Read more >
pandas GroupBy: Your Guide to Grouping Data in Python
groupby () accepts not just one or more column names, but also many array-like structures: A one-dimensional NumPy array; A list; A pandas...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found