question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sort doesn't work after agg without ungroup

See original GitHub issue

I am not sure if this is a bug but it is definitely not a feature request so I am writing it as a bug report.

Problem I was expecting the following code to return a sorted list of the count grouped by the primary_type key but that is not the case. I see an unsorted list. Is this an expected behaviour?

from clumper import Clumper
(
    Clumper.read_jsonl("https://calmcode.io/datasets/pokemon.jsonl")
    .mutate(primary_type = lambda c : c['type'][0])
    .group_by('primary_type')
    .agg(occurence = ('primary_type','count'))
    .sort(key=lambda x : x['occurence'])
    .collect()
)

Additional context

Adding ungroup after agg and before sort solves it. The following code produces what I want.

from clumper import Clumper

(
    Clumper.read_jsonl("https://calmcode.io/datasets/pokemon.jsonl")
    .mutate(primary_type = lambda c : c['type'][0])
    .group_by('primary_type')
    .agg(occurence = ('primary_type','count'))
    .ungroup()
    .sort(key=lambda x : x['occurence'])
    .collect()
)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
samukwekucommented, Aug 19, 2020
  • @koaning thanks for that. And to think I use it all the time when writing documentation and yet fail to use it when making comments 🤦‍♂
1reaction
koaningcommented, Aug 19, 2020

@samarpan-rai yeah I might argue this is expected behaviour. Just to check, have you seen this guide on the docs?

The .agg operation does not remove the grouping. It’s a bit of extra work but I found that in the end, groups should be dealt with explicitly. Otherwise the user might need to do a lot of accounting in their mind. One way of looking at it, it’s still better than having to call .reset_index().

@samukweku just mentioning it as a by the way, but on GitHub you can get pretty syntax highlighting for python if you add this at the start of a code snippet.

```python
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas groupby sort within groups retaining multiple ...
I suspect I need to extract g.index and use it in some way but have not been able to get this sorted (pun...
Read more >
Selecting Ungrouped Columns Without Aggregate Functions
The “Bare Columns” Problem​​ We'll call columns/expressions that are in SELECT without being in an aggregate function, nor in GROUP BY , ...
Read more >
GroupBy and Ungroup functions in Power Apps
The GroupBy function returns a table with records grouped together ... GroupBy and Ungroup don't modify a table; instead they take a table ......
Read more >
Sorting on an aggregate with a group by
To fix it, just delete the Sort and add a new Dynamic Sort with the same expression as before. Since the Aggregate is...
Read more >
Group by: split-apply-combine
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True , the default.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found