Dataframe.unstack().stack(0) erroneously changes data if indices not initially sorted
See original GitHub issueCode Sample, a copy-pastable example if possible
>>> import pandas as pd
>>> # Notice that the columns are not sorted below.
>>> df = pd.DataFrame(data=[[0,1],[2,3],[4,5],[6,7]],index = pd.MultiIndex.from_product([['a','b'],['A','B']]),columns=['d','c'])
>>> # The value of the element with indices 'b', 'B', and 'd'
>>> df.loc[('b','B'),'d']
6
>>> # The value of that *same* element now.
>>> df.unstack().stack(0).loc[('b','d'),'B']
7
>>> # What went wrong?
>>> df
d c
a A 0 1
B 2 3
b A 4 5
B 6 7
>>> # During some step, the indices got sorted but the values did not follow.
>>> df.unstack().stack(0)
A B
a d 1 3
c 0 2
b d 5 7
c 4 6
Problem description
With MultiIndex
ed DataFrames
, it becomes convenient to unstack(level)
and stack(level)
your DataFrame until it has the indices you need to do what you want to do. These methods will sort your indices or levels if they were not sorted to begin with.
However, apparently I have discovered a case where the indices got sorted, but the values did not follow, resulting in the “shuffling” you see above.
Expected Output
The expected behavior is that these operations should not result in data scrambling / shuffling; a complete set of indices (like {‘b’,‘B’,‘d’}) should always refer to the same value (in this case, 6).
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
No worries! This was only added to the issue template recently. Just trying to help users help themselves if possible 😀
@joseortiz3 : Thanks for reporting! One thing that we suggest users do is upgrade if possible to the latest version, as we may have already resolved the issue.
I can’t reproduce this in
0.20.3
(latest). Can you upgrade and see if you can still reproduce?