API: DataFrameGroupBy column subset selection with single list?
See original GitHub issueI wouldn’t be surprised if there is already an issue about this, but couldn’t directly find one.
When doing a subselection of columns on a DataFrameGroupBy object, both a plain list (so a tuple within the __getitem__
[] brackets) as the double square brackets (a list inside the __getitem__
[] brackets) seems to work:
In [6]: df = pd.DataFrame(np.random.randint(10, size=(10, 4)), columns=['a', 'b', 'c', 'd'])
In [8]: df.groupby('a').sum()
Out[8]:
b c d
a
0 0 5 7
3 18 6 12
4 16 6 9
6 10 11 11
9 3 3 0
In [9]: df.groupby('a')['b', 'c'].sum()
Out[9]:
b c
a
0 0 5
3 18 6
4 16 6
6 10 11
9 3 3
In [10]: df.groupby('a')[['b', 'c']].sum()
Out[10]:
b c
a
0 0 5
3 18 6
4 16 6
6 10 11
9 3 3
Personally I find this df.groupby('a')['b', 'c'].sum()
a bit strange, and inconsistent with how DataFrame indexing works.
Of course, on a DataFrameGroupBy you don’t have the possible confusion with indexing multiple dimensions (rows, columns), but still.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:18 (16 by maintainers)
Top Results From Across the Web
Returning subset of each group from a pandas groupby object
So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the)...
Read more >All Pandas groupby() You Should Know for Grouping Data ...
Pandas' groupby() allows us to split data into separate groups to ... To perform aggregation on a specific column ... Creating a subset...
Read more >Pandas Group Rows into List Using groupby()
You can group DataFrame rows into a list by using pandas.DataFrame.groupby() function on the column of interest, select the column you want ...
Read more >Group by: split-apply-combine — pandas 1.5.2 documentation
A Python function, to be called on each of the axis labels. A list or NumPy array of the same length as the...
Read more >pandas GroupBy: Your Guide to Grouping Data in Python
groupby () accepts not just one or more column names, but also many array-like structures: A one-dimensional NumPy array; A list; A pandas...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@yehoshuadimarsky we have 3000+ issues and constant comments - to be honest we barely have time to triage on the PRs
even really important things are not necessarily discussed at length
just like everyone else has limited time - the best way to prompt a discussion is to push a change
So this is my first time working on pandas code, and I’m a little confused here, so please bear with me. I’m also new to linking to code on GitHub.
As I understand, when an object calls
__getitem__
by using brackets, if you pass in several keys, they are implicitly converted to a tuple of one key. Sodf['a','b']
is reallydf[('a','b')]
under the hood.I’m having trouble in tracing the code path to figure out where exactly the
__getitem__
on theGroupBy
is actually implemented here:DataFrame.groupby
is called on the superclassNDFrame
hereDataFrameGroupBy
object hereGroupBy
_GroupBy
SelectionMixin
, defined here__getitem__
herekey
is a list or tuple, returnsself._gotitem(list(key), ndim=2)
self._gotitem
needs to be implemented by the respective subclasses, which in this case is theDataFrameGroupBy
object, and is implemented hereDataFrameGroupBy
) with thekey
(a list/tuple) passed as a slice to theselection
parameterselection
parameter is implemented in the parent_GroupBy
object, which sets the internalself._selection
attribute to thekey
hereAny help here would be greatly appreciated. Thanks.