BUG: slicing a MultiIndex does not preserve the sequence of the index since pandas 1.2.0rc0
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Between 1.1.5 and 1.2.0rc2 the behavior of slicing MultiIndexes has changed. Up to 1.1.5 slicing a MultiIndex like df.loc[(slice(None), somel_items), :] preserved the sequence of the sliced DataFrame. From 1.2.0rc0 the sequence is changed.
Code Sample
import pandas as pd
print("pandas version %s" % pd.__version__)
index = pd.MultiIndex.from_tuples([(1, 1), (1, 2), (1, 7), (1, 6),
(2, 2), (2, 3), (2, 8), (2, 7)])
all_items = index.get_level_values(1) # all items from level 1
df = pd.DataFrame({'x': range(8)}, index=index)
df_sliced = df.loc[(slice(None), all_items), :]
# df_sliced, should be identical with df, as all_items contains all items from level 1
print(df_sliced)
pd.testing.assert_frame_equal(df, df_sliced)
# works if and only if pd.__version__ < 1.2.0rc0
print("Success")
Problem description
Running the sample code in 1.1.5 gives the following output:
pandas version 1.1.5
x
1 1 0
2 1
7 2
6 3
2 2 4
3 5
8 6
7 7
Success
whereas in 1.2.4 it gives
pandas version 1.2.4
x
1 1 0
6 3
2 1
2 2 4
3 5
8 6
1 7 2
2 7 7
Traceback (most recent call last):
File "tmp/pandas_demo.py", line 19, in <module>
pd.testing.assert_frame_equal(df, df_sliced)
File "/home/jmu3si/Devel/pylife/.venv/lib/python3.8/site-packages/pandas/_testing.py", line 1657, in assert_frame_equal
assert_index_equal(
File "/home/jmu3si/Devel/pylife/.venv/lib/python3.8/site-packages/pandas/_testing.py", line 805, in assert_index_equal
assert_index_equal(
File "/home/jmu3si/Devel/pylife/.venv/lib/python3.8/site-packages/pandas/_testing.py", line 825, in assert_index_equal
_testing.assert_almost_equal(
File "pandas/_libs/testing.pyx", line 46, in pandas._libs.testing.assert_almost_equal
File "pandas/_libs/testing.pyx", line 161, in pandas._libs.testing.assert_almost_equal
File "/home/jmu3si/Devel/pylife/.venv/lib/python3.8/site-packages/pandas/_testing.py", line 1073, in raise_assert_detail
raise AssertionError(msg)
AssertionError: MultiIndex level [0] are different
MultiIndex level [0] values are different (25.0 %)
[left]: Int64Index([1, 1, 1, 1, 2, 2, 2, 2], dtype='int64')
[right]: Int64Index([1, 1, 1, 2, 2, 2, 1, 2], dtype='int64')
I am not sure if this is necessarily a problem, I stumbled across it because a test suite that used pd.testing.assert_frame_equal() failed due to this. So either the actual sequence should be preserved when slicing a DataFrame or pd.testing.assert_frame_equal() should not fail if the sequence is shuffled (but the index is correct).
Expected Output
As discussed above.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-71-lowlatency Version : #79-Ubuntu SMP PREEMPT Wed Mar 24 12:38:51 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8
pandas : 1.2.4 (resp. 1.1.5) numpy : 1.20.2 pytz : 2021.1 dateutil : 2.8.1 pip : 20.2.4 setuptools : 50.3.0.post20201006 Cython : 0.29.23 pytest : 6.2.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : None tables : None tabulate : None xarray : 0.17.0 xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)

Top Related StackOverflow Question
The ordering is wrong for duplicates like in your example, will fix this shortly but lets leave open till then
Intersection with sort=False is first thing which comes to mind.
Loc should in theory select in the order your indexer is, so your incoming sequence has to be the same order somehow.