question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DOC: Order of groups in groupby and head method

See original GitHub issue

When sort = True is passed to groupby (which is by default) the groups will be in sorted order. If you loop through them they are in sorted order, if you compute the mean, std… they are in sorted order but if you use the method head they are NOT in sorted order.


import pandas as pd

df = pd.DataFrame([[2, 100], [2, 200], [2, 300], [1, 400], [1, 500], [1, 600]], columns = ['A', 'B'])

grouped = df.groupby(df['A'], sort = True)
for name, group in grouped:
    print(group)
print(grouped.mean())
print(grouped.head(1))

Is this expected? I have not found this behaviour documented. I think it is confusing and has caused me a headache because I was combining output of the mean and head methods in the same DataFrame, and since the data was not ordered before those results were getting mixed because of these order issues. I have pandas 0.20.3

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
tdpetroucommented, Nov 16, 2017

There is a big problem with the docstrings here for DataFrameGroupBy.head. They say:

Essentially equivalent to ``.apply(lambda x: x.head(n))``,
except ignores as_index flag.

Dataframegroupby.head keeps the original ordering of the dataframe. It doesn’t even order by the keys. Using .apply(lambda x: x.head(n)) puts the group keys in the index and sorts them.

Edit, I see that @Ifnister already pointed this out. I think it would make a lot more sense to actually do .apply(lambda x: x.head(n)).

0reactions
rhshadrachcommented, Oct 20, 2022

Head is a filter; sort is only applied to reducers within groupby. To my knowledge this isn’t documented, but I haven’t checked. I think documentation on this should be added to the Series.groupby and DataFrame.groupby API docs as well as the User Guide.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas Groupby Sort within Groups - Spark by {Examples}
You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first...
Read more >
pandas groupby, then sort within groups - Stack Overflow
The order of rows WITHIN A SINGLE GROUP are preserved, however groupby has a sort=True statement by default which means the groups themselves ......
Read more >
GroupBy.head - Pandas
head (n)) , but it returns a subset of rows from the original DataFrame with original index and order preserved ( as_index flag...
Read more >
All Pandas groupby() You Should Know for Grouping Data ...
In SQL, the GROUP BY statement groups row that has the same category ... In this article, you'll learn the “group by” process...
Read more >
pandas GroupBy: Your Guide to Grouping Data in Python
You'll work with real-world datasets and chain GroupBy methods together ... SELECT state, count(name) FROM df GROUP BY state ORDER BY state;.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found