[ENH] pivot_wider() and pivot_longer()
See original GitHub issueBrief Description
Pandas has several functions for pivoting. The melt
function takes a ‘wide’ dataframe and makes it ‘long’, and pivot_table
takes a ‘long’ dataframe and makes it ‘wide’. Similarly, stack
and unstack
perform pivot operations using multilevel indices. Tom Augspurger explains all of the pandas pivoting functions in his Modern Pandas guide. If you look through his guide, you can see that the API for these four functions are all different and difficult to remember.
I would like to propose pivot_wider
and pivot_longer
functions inspired by functions in the R tidyr package. These functions are pure syntactic sugar around pandas melt
and unpivot
. They should have consistent APIs, and they should be symmetric. That is, you should be able to take a ‘wide’ dataframe, pass it through pivot_longer
, then pass it through pivot_wider
to get back to the exact same ‘wide’ dataframe that we started with.
Example API
Imagine that we have a ‘wide’ table of heartrate data for patients treated under two different drugs, a and b.
name | a | b |
---|---|---|
Wilbur | 67 | 56 |
Petunia | 80 | 90 |
Gregory | 64 | 50 |
Here’s how we would convert this to a ‘long’ table.
df.pivot_longer(column_names=['a', 'b'], names_to=['drug'], values_to=['heartrate'])
The output would be a ‘long’ table that look like this.
name | drug | heartrate |
---|---|---|
Wilbur | a | 67 |
Petunia | a | 80 |
Gregory | a | 64 |
Wilbur | b | 56 |
Petunia | b | 90 |
Gregory | b | 50 |
Now we’ll take the above table and transform it back into a ‘wide’ table using pivot_wider
.
df.pivot_wider(names_from=['drug'], values_from=['heartrate'])
And now we’re back to our ‘wide’ table:
name | a | b |
---|---|---|
Wilbur | 67 | 56 |
Petunia | 80 | 90 |
Gregory | 64 | 50 |
I would love some feedback on the API before I start implementing this in code!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:10 (8 by maintainers)
@samukweku I realized I didn’t explicitly link what @benjaminjack’s PR was. Here it is: https://github.com/benjaminjack/pyjanitor/commit/e3df817903c20dd21634461c8a92aec137963ed0
@samukweku assigning! If possible, please use as much as you can salvage from @benjaminjack’s PR, and include him in the changelog entry! Sharing the work/credit keeps things encouraging for everyone involved.