question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Setting multiple values via .loc produces NaNs with MultiIndex

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
s = pd.Series([True]*4, index=i)
s.loc[0,:] = s.loc[0,:] 
print(s)

Issue Description

Executing the script above on pandas 1.4.2 produces NaNs:

0  2     NaN
   3     NaN
1  2    True
   3    True

Expected Behavior

Behavior before 1.4.2 (expected):

0  2    True
   3    True
1  2    True
   3    True

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.9.6.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.2 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 57.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Aug 20, 2022

The reason this is now failing is similar as some of the cases that are discussed in https://github.com/pandas-dev/pandas/issues/46704

Te getitem operation on the right side has changed result:

In [12]: import pandas as pd
    ...: i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
    ...: s = pd.Series([True]*4, index=i)

In [13]: s.loc[0,:]
Out[13]: 
2    True
3    True
dtype: bool

Selecting 0 now dropped that level, returning a Series without MultiIndex. And as a result of that change, now setting with that Series no longer works. I think one can argue that setting those values still should work, as also in the setitem operation you are selecting the first level, so that should have a similar effect. But that is something that also didn’t work in the past:

In [1]: pd.__version__
Out[1]: '1.3.5'

In [2]: import pandas as pd
   ...: i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
   ...: s = pd.Series([True]*4, index=i)

In [3]: s.loc[0, :].droplevel(0)   # <-- the equivalent of what now is returned directly by s.loc[0, :]
Out[3]: 
2    True
3    True
dtype: bool

In [4]: s.loc[0, :] = s.loc[0, :].droplevel(0)

In [5]: s
Out[5]: 
0  2     NaN
   3     NaN
1  2    True
   3    True
dtype: object
0reactions
simonjayhawkinscommented, Aug 30, 2022

But that is something that also didn’t work in the past:

ok, let’s not change that this late in the 1.4.x release cycle. removing milestone.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unexpected behavior of .loc on multilevel indexed dataframes
I encountered a behavior of .loc for dataframes with a mulitlevel index, that I can't explain. The setup: import pandas as pd df...
Read more >
MultiIndex / advanced indexing — pandas 1.5.2 documentation
Creating a MultiIndex (hierarchical index) object​​ You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex...
Read more >
Selecting multiple values from one level of a MultiIndex
`1` and `'b'` aren't in the Index, so it creates a new DataFrame of nan values. Same thing would happen if you did...
Read more >
Manipulating DataFrames with Pandas - Trenton McKinney
6.3 Filtering using NaNs¶. In certain scenarios, it may be necessary to remove rows and columns with missing data from a DataFrame. The...
Read more >
How do I use the MultiIndex in pandas? - YouTube
One of the most powerful features in pandas is multi -level indexing (or "hierarchical indexing"), which allows you to add extra dimensions ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found