Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ambiguous behaviour when index is bool type?

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

pandas.Series([True, True, False]).value_counts()[[False]]

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

pandas/core/indexers.py in check_array_indexer(array, indexer)
    468         # GH26658
    469         if len(indexer) != len(array):
--> 470             raise IndexError(
    471                 f"Boolean index has wrong length: "
    472                 f"{len(indexer)} instead of {len(array)}"

IndexError: Boolean index has wrong length: 1 instead of 2

It seems that Series.__getitem__ and check_bool_indexer do not work well for bool-typed index

Expected Output

<pandas.Series>
False    1
dtype: int64

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.8.5.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.1.4 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.3 setuptools : 49.6.0.post20200814 Cython : None pytest : None hypothesis : None sphinx : 3.2.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.18.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

kwhkimcommented, Aug 31, 2021

I am aware of this edge case. As far as I figured out, a list consisted only of True and Falses works as boolean values(not index).

In this case, I would do

x = pandas.Series([True, True, False]).value_counts()
x[ x.index == False]

or you can slip in some value other than True or False

pandas.Series([True, True, False]).value_counts().append(pandas.Series([0], index = ['None']), verify_integrity=False)[[True, 'None']]

I think there should be something like .ibool for boolean indexing for completeness and consistency even though there is little chance that things like this happen.

I am from R. And from the designing persepective, python including pandas has so many exceptions. For example, even importing a module has some egde case like that import time never imports time.py. I think it’s poor design problem

1reaction

simonjayhawkinscommented, Aug 25, 2021