Do not pass lambdas to the backend in case of basic GroupBy aggregation
See original GitHub issueCurrent GroupBy implementation is still passing python lambdas to the backend to do simple aggregation (for example mean
):
https://github.com/modin-project/modin/blob/89f6cdde56abf5dcf9561ea6721fb31f84a855d5/modin/pandas/groupby.py#L128-L129
These lambdas are not processable by non-python engines, and so even if the backend does support particular aggregation it can’t be executed because it’s passed in a python function format. At the moment, all of those lambdas are being passed into a single method QueryCompiler.groupby_agg
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
Pandas .groupby(), Lambda Function, & Pivot Table Tutorial
This lesson of the Python Tutorial for Data Analysis covers grouping data with pandas .groupby(), using lambda functions and pivot tables, and sorting...
Read more >pandas GroupBy: Your Guide to Grouping Data in Python
In this tutorial, you'll learn how to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, ...
Read more >Aggregating lambda functions in pandas and numpy
You need to specify the column in data whose values are to be aggregated. For example, data = data.groupby(['type', 'status', ...
Read more >Pandas GroupBy - GeeksforGeeks
Groupby concept is really important because it's ability to aggregate data efficiently, both in performance and the amount code is magnificent.
Read more >Comprehensive Guide to Grouping and Aggregating with ...
Pandas groupby and aggregation provide powerful capabilities for ... The most common built in aggregation functions are basic math functions ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Can we go ahead with #3373 then and fix the bloat as a separate refactoring step? It would unblock groupby implementations in OmniSci immediately while we figure how to better split the beast.
We likely want to keep QC API as clean as possible and all the implementations should go to a different location. We could move those implementations to
kernels
folder after #2957 is merged. Something like this: