question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Group By Grouping set /Cube/ Rollup

See original GitHub issue

Problem description

Basically these are performance tools in SQL to get analysis in multiple dimensions and they are missing in Pandas out of the box. Some of these can be achieved by a pivot table and melt/stack functions but being tools for analysis these functions should be a must and it also decreases the number of lines of code.

Group by Grouping set will help to rewrite the query with multiple groups by clauses combined with union statements into a single query. Cube is shorthand notation of grouping sets if the user chooses all the combinations of the fields listed in the cube clause

SELECT
    column1,
    column2,
    aggregate_function (column3)
FROM
    table_name
GROUP BY
    GROUPING SETS (
        (column1, column2),
        (column1),
        (column2),
        ()
);

Select   column1,
            column2,
            column3,
            column4,
            aggregate_function (column5)
from table
group by column1, column2, cube (column3,column4)```

Current way
```pseudo code
  a= <pandas dataframe>
  a1 = a.groupby([column1]).sum(column5)
  a2  = a.groupby([column1,column2]).sum(column5)
   ...
  an = a.groupby([column1,...,columnn]).sum(column5)
 result= union(a1,a2,......an)

Expected way

  a= <pandas dataframe>
  
  gropby_cube1 = a.gropby([column1,column2]).cube([column3,.....,columnn]).sum(column5)
   gropby_cube2 = a.gropby.cube([column1,column2,.....,columnn]).sum(column5)

   gropby_sets1 = a.gropby.sets( {column1,column2} ,{column1,column2,column3} ,{}).sum(column5)
   gropby_sets2 = a.gropby([column1,column2).sets({column1,column2,column3} ,{} ).sum(column5)

   gropby_rollup1 = a.gropby.rollup({column1,column2,column3}).sum(column5)
   gropby_rollup2 = a.gropby([column1,column2).rollup({column3} ).sum(column5)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
rsdpyenugulacommented, Nov 9, 2019

@jreback , I not sure why it’s complicated. It’s already been implemented in different open sources DB like PostgreSQL So it would be a similar approach so no need to reinvent the wheel. As I said above these function are implemented such way it helps the performance rather than the concat way.
pandas was known as an Analytics tool so I strongly say these functions should be out of the box. If this API was implemented it would also help other libraries built on pandas API. For example, Dask. Every software will have some bug’s and they will be fixed in further iterations. In my above example, you can see 2 different API styles. 1st style is the existing pandas style. If you choose the second style it is a totally new API so even if it has bug it will not affect other API’s and once this is fully implemented without bugs the other existing API can be deprecated.

1reaction
jrebackcommented, Feb 22, 2020

@rsdpyenugula if you really want to this then you should edit the top with detailed examples (quality not quantity matters here); doc-strings, and typed function signatures

a POC implementation PR would also be nice to have

pandas is all volunteer and folks have limited time with 3000+ issues; so contributions are the only thing to move this forward

Read more comments on GitHub >

github_iconTop Results From Across the Web

Examples of grouping sets, cube, and rollup queries - IBM
The following examples illustrate the grouping, cube, and rollup forms of ... Example 1: Here is a query with a basic GROUP BY...
Read more >
Group By in SQL Server with CUBE, ROLLUP and ...
The GROUP BY clause in SQL Server allows grouping of rows of a query. Generally, GROUP BY is used with an aggregate SQL...
Read more >
Dashboards and GROUPING SETS - Max Halford
You can use CUBE when you want to group on all the combinations of dimensions. It's a good default mode when you're not...
Read more >
Enhanced Aggregation, Cube, Grouping and Rollup
The GROUPING SETS clause in GROUP BY allows us to specify more than one GROUP BY option in the same record set. All...
Read more >
GROUPING SETS and COLLECT Don't Get Along - DBoriented
While reviewing some code a few days ago, I saw a query of the following form: select 'X='||x, collect(z) from t group by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found