question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataframe.unstack().stack(0) erroneously changes data if indices not initially sorted

See original GitHub issue

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> # Notice that the columns are not sorted below.
>>> df = pd.DataFrame(data=[[0,1],[2,3],[4,5],[6,7]],index = pd.MultiIndex.from_product([['a','b'],['A','B']]),columns=['d','c'])
>>> # The value of the element with indices 'b', 'B', and 'd'
>>> df.loc[('b','B'),'d']
6
>>> # The value of that *same* element now.
>>> df.unstack().stack(0).loc[('b','d'),'B']
7
>>> # What went wrong?
>>> df
     d  c
a A  0  1
  B  2  3
b A  4  5
  B  6  7
>>> # During some step, the indices got sorted but the values did not follow.
>>> df.unstack().stack(0)
     A  B
a d  1  3
  c  0  2
b d  5  7
  c  4  6

Problem description

With MultiIndexed DataFrames, it becomes convenient to unstack(level) and stack(level) your DataFrame until it has the indices you need to do what you want to do. These methods will sort your indices or levels if they were not sorted to begin with.

However, apparently I have discovered a case where the indices got sorted, but the values did not follow, resulting in the “shuffling” you see above.

Expected Output

The expected behavior is that these operations should not result in data scrambling / shuffling; a complete set of indices (like {‘b’,‘B’,‘d’}) should always refer to the same value (in this case, 6).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
gfyoungcommented, Aug 11, 2017

No worries! This was only added to the issue template recently. Just trying to help users help themselves if possible 😀

1reaction
gfyoungcommented, Aug 11, 2017

@joseortiz3 : Thanks for reporting! One thing that we suggest users do is upgrade if possible to the latest version, as we may have already resolved the issue.

I can’t reproduce this in 0.20.3 (latest). Can you upgrade and see if you can still reproduce?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas DataFrame.unstack() Changes Order of Row and ...
Note how the headers are no longer sorted. I am wondering what is a good way to solve this problem so as to...
Read more >
pandas.DataFrame.unstack — pandas 1.5.2 documentation
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is...
Read more >
Reshaping a DataFrame with Pandas stack() and unstack()
Reshaping is often needed when you work with datasets that contain variables with some kinds of sequences, say, time-series data. Source from ...
Read more >
Manipulating DataFrames with Pandas - Trenton McKinney
In this course, you'll learn how to leverage pandas' extremely powerful data manipulation engine to get the most out of your data. It...
Read more >
Part1: Stack, Unstack - Medium
If you want any column to stay in index you need to set your index column explicitly using pd.DataFrame.set_index method before performing stack...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found