Multiindex slicing with NaNs, unexpected results
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame(
pd.np.random.rand(2, 3),
columns=pd.MultiIndex.from_tuples([('a', 'foo'), ('b', 'bar'), ('b', pd.np.nan)], names=['first','second'])
)
# EXPECTED slicing everything on first level
df.loc[:, (['a', 'b'])]
Out[35]:
first a b
second foo bar NaN
0 0.678021 0.383672 0.074164
1 0.738492 0.992545 0.661247
# EXPECTED just slicing one value from first level
df.loc[:, (['b'])]
Out[29]:
first b
second bar NaN
0 0.383672 0.074164
1 0.992545 0.661247
# EXPECTED slicing out b, bar
df.loc[:, (['b'], ['bar'])]
Out[33]:
first b
second bar
0 0.383672
1 0.992545
# UNEXPECTED slicing out b, nan
df.loc[:, (['b'], [pd.np.nan])]
Out[36]:
Empty DataFrame
Columns: []
Index: [0, 1]
# UNEXPECTED slicing out b, [nan, 'bar']
df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[39]:
first b
second bar
0 0.383672
1 0.992545
# EXPECTED slicing out b, nan without the index
df.loc[:, ('b', pd.np.nan)]
Out[37]:
0 0.074164
1 0.661247
Name: (b, nan), dtype: float64
Problem description
When trying to slice out multiple values from a particular level including levels with a nan value, the levels with nan are not retrieved.
Expected Output
Both of these I expect to work:
df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[40]:
first b
second bar NaN
0 0.383672 0.074164
1 0.992545 0.661247
df.loc[:, (['b'], [pd.np.nan])]
Out[40]:
first b
second NaN
0 0.074164
1 0.661247
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 2.7.15.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.36.3.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.22.0 pytest: 3.10.0 pip: 18.1 setuptools: 40.5.0 Cython: 0.28.5 numpy: 1.14.2 scipy: 1.0.1 pyarrow: None xarray: 0.10.9 IPython: 5.8.0 sphinx: 1.8.1 patsy: 0.5.1 dateutil: 2.7.2 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.7 feather: None matplotlib: 2.2.3 openpyxl: 2.5.9 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml: 4.2.1 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.11 pymysql: None psycopg2: 2.7.5 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
As long as no one is asigned you are free to go 😃
Hi, I’d like to try and work on adding some tests for this one.