question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: can't use `.loc` with boolean values in MultiIndex

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> df = pd.DataFrame({'a': [True, True, False, False], 'b': [True, False, True, False], 'c': [1,2,3,4]})
>>> df
       a      b  c
0   True   True  1
1   True  False  2
2  False   True  3
3  False  False  4
>>> df.set_index(['a']).loc[True]
          b  c
a
True   True  1
True  False  2
>>> df.set_index(['a', 'b']).loc[True]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1071, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1306, in _getitem_axis
    self._validate_key(key, axis)
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1116, in _validate_key
    raise KeyError(
KeyError: 'True: boolean label can not be used without a boolean index'

Issue Description

The above snippet creates a data frame with two boolean columns. When just one of those columns is set as the index, .loc[] can select rows based on a boolean value. When both columns are set as the index, though, the same value causes .loc[] to raise a KeyError. The error message makes it seems like pandas doesn’t regard a multi-index as a “boolean index”, even if the columns making up the index are all boolean.

In pandas 1.4.3, I can work around this by indexing with 1 and 0 instead of True and False. But in 1.5.0 it seems to be an error to provide a integer value to a boolean index.

Expected Behavior

I would expect to be able to use boolean index values even when those values are part of a multi-index:

>>> df.set_index(['a', 'b']).loc[True]
       c
b
True   1
False  2

Installed Versions

INSTALLED VERSIONS ------------------ commit : 74f4e81ca84a290e2ef617dcbf220776c21f6bca python : 3.10.0.final.0 python-bits : 64 OS : Linux OS-release : 5.18.10-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Thu, 07 Jul 2022 17:18:13 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.5.0.dev0+1134.g74f4e81ca8 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 setuptools : 57.4.0 pip : 22.1.2 Cython : 0.29.30 pytest : 7.1.1 hypothesis : 6.39.4 sphinx : 4.3.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.1 pandas_datareader: None bs4 : 4.10.0 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 snappy : None sqlalchemy : 1.4.31 tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Aug 1, 2022

This should work. Thanks for the report.

1reaction
dicristinacommented, Jul 15, 2022

As per the user guide, in the MultiIndex case the notation .loc[True] is shorthand for .loc[(True,)]. As a workaround use the tuple notation instead:

import pandas as pd

df = pd.DataFrame({'a': [True, True, False, False], 'b': [True, False, True, False], 'c': [1,2,3,4]})
df.set_index(['a', 'b']).loc[(True,)]

This gives the expected result in 1.4.3:

>>> df.set_index(['a', 'b']).loc[(True,)]
<stdin>:1: PerformanceWarning: indexing past lexsort depth may impact performance.
       c
b       
True   1
False  2
Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas: Boolean indexing with multi index - Stack Overflow
The problem is that the loc method only works with boolean inputs on a 1D index. If you provide the values of the...
Read more >
MultiIndex / advanced indexing - Pandas
Slicing is primarily on the values of the index when using [],ix,loc , and always positional when using iloc . The exception is...
Read more >
Solve Pandas "ValueError: cannot reindex from a duplicate axis"
Apparently, the python error is the result of doing operations on a DataFrame that has duplicate index values. Operations that require unique ...
Read more >
Hierarchical Indexing | Python Data Science Handbook
Similarly, you can construct the MultiIndex directly using its internal encoding by passing levels (a list of lists containing available index values for ......
Read more >
KeyError Pandas – How To Fix - Data Independent
Error catch option: Use df.get('your column') to look for your column value. No error will be thrown if it is not found.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found