question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: unstack() does not always sort index in 0.23

See original GitHub issue

Code Sample

import pandas as pd
index = pd.MultiIndex(levels=[['A','B','C','D','E']] * 2, labels=[[4,4,4,3], [4,2,0,1]])
pd.Series(0, index).unstack()

Problem description

In Pandas 0.20, 0.21, and 0.22, this gave the expected result:

     A    B    C    E
D  NaN  0.0  NaN  NaN
E  0.0  NaN  0.0  0.0

But in Pandas 0.23, the result is not sorted:

     E    C    A    B
E  0.0  0.0  0.0  NaN
D  NaN  NaN  NaN  0.0

The documentation says “The level involved will automatically get sorted”, and while I’ve seen the explanation of confusing implementation details leaking out in #15105 and some other outright bugs in #9514, this seems to be a different bug, and a regression.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.4.final.0 python-bits: 64 OS: Linux machine: x86_64 LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.1 numpy: 1.13.1

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
jzwinckcommented, Jun 30, 2018

@WillAyd, #15105 asks for a new sort option to be added to unstack. The documented, specified, and actual behaviors have for many years been that it sorts the index. Suddenly in Pandas 0.23, the behavior changed, without a FutureWarning, without a documentation change, and without a mention in #15105, which suggests the change was an accident. Significant functional changes should be made deliberately, with discussion, not slipped in without notice and then documented a few releases later.

I suggest that the longstanding and clearly documented behavior (sorted unstack) should be restored, and then #15105 can continue to explore new ideas such as adding an option to not sort. If the default behavior is to be changed, a FutureWarning could be used to help users transition.

You have marked this as a Docs issue. But it is a regression in Pandas 0.23, and a functional bug in the code, not a cosmetic one in the docs.

1reaction
jiangyue12392commented, Jun 27, 2019

Hi @deisdenis, I have also looked into this issue. It seems that the function descriptor of remove_unused_levels specifies that the multiIndex order needs to be preserved so no sorting should be done in this function. Maybe an alternative is to add a sort argument for unstack. Are you still interested in this issue? What do you think?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas unstack should not sort remaining indexes
The problem is that the order of the third index has changed since the index was sorted automatically and alphabetically. Now, the line...
Read more >
pandas.DataFrame.unstack — pandas 1.5.2 documentation
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is...
Read more >
python-pandas-0.23.4-bp151.2.3 - SUSE Package Hub -
(GH19320) * Bug Fixes * Conversion + Bug in constructing Index with an iterator or ... GroupBy.bfill() where the fill within a grouping...
Read more >
pandas documentação - Python - 22 - Passei Direto
Grouper object is used to override ambiguous column name (GH17383) • Bug in ... Bug in SparseDataFrame.fillna() not filling all NaNs when frame...
Read more >
What's New — pandas 0.23.4 documentation
With NumPy 1.15 and pandas 0.23.1 or earlier, numpy.all() will no longer ... Bug in DataFrame.unstack() which raises an error if index is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found