Sort doesn't work after agg without ungroup
See original GitHub issueI am not sure if this is a bug but it is definitely not a feature request so I am writing it as a bug report.
Problem
I was expecting the following code to return a sorted list of the count grouped by the primary_type
key but that is not the case. I see an unsorted list. Is this an expected behaviour?
from clumper import Clumper
(
Clumper.read_jsonl("https://calmcode.io/datasets/pokemon.jsonl")
.mutate(primary_type = lambda c : c['type'][0])
.group_by('primary_type')
.agg(occurence = ('primary_type','count'))
.sort(key=lambda x : x['occurence'])
.collect()
)
Additional context
Adding ungroup
after agg
and before sort
solves it. The following code produces what I want.
from clumper import Clumper
(
Clumper.read_jsonl("https://calmcode.io/datasets/pokemon.jsonl")
.mutate(primary_type = lambda c : c['type'][0])
.group_by('primary_type')
.agg(occurence = ('primary_type','count'))
.ungroup()
.sort(key=lambda x : x['occurence'])
.collect()
)
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Pandas groupby sort within groups retaining multiple ...
I suspect I need to extract g.index and use it in some way but have not been able to get this sorted (pun...
Read more >Selecting Ungrouped Columns Without Aggregate Functions
The “Bare Columns” Problem We'll call columns/expressions that are in SELECT without being in an aggregate function, nor in GROUP BY , ...
Read more >GroupBy and Ungroup functions in Power Apps
The GroupBy function returns a table with records grouped together ... GroupBy and Ungroup don't modify a table; instead they take a table ......
Read more >Sorting on an aggregate with a group by
To fix it, just delete the Sort and add a new Dynamic Sort with the same expression as before. Since the Aggregate is...
Read more >Group by: split-apply-combine
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True , the default.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@samarpan-rai yeah I might argue this is expected behaviour. Just to check, have you seen this guide on the docs?
The
.agg
operation does not remove the grouping. It’s a bit of extra work but I found that in the end, groups should be dealt with explicitly. Otherwise the user might need to do a lot of accounting in their mind. One way of looking at it, it’s still better than having to call.reset_index()
.@samukweku just mentioning it as a by the way, but on GitHub you can get pretty syntax highlighting for python if you add this at the start of a code snippet.