Ambiguous behaviour when index is bool type?
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
pandas.Series([True, True, False]).value_counts()[[False]]
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
pandas/core/indexers.py in check_array_indexer(array, indexer)
468 # GH26658
469 if len(indexer) != len(array):
--> 470 raise IndexError(
471 f"Boolean index has wrong length: "
472 f"{len(indexer)} instead of {len(array)}"
IndexError: Boolean index has wrong length: 1 instead of 2
It seems that Series.__getitem__
and check_bool_indexer
do not work well for bool-typed index
Expected Output
<pandas.Series>
False 1
dtype: int64
Output of pd.show_versions()
[paste the output of pd.show_versions()
here leaving a blank line after the details tag]
INSTALLED VERSIONS
commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.8.5.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 1.1.4 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.3 setuptools : 49.6.0.post20200814 Cython : None pytest : None hypothesis : None sphinx : 3.2.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.18.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
I am aware of this edge case. As far as I figured out, a list consisted only of
True
andFalse
s works as boolean values(not index).In this case, I would do
or you can slip in some value other than
True
orFalse
I think there should be something like
.ibool
for boolean indexing for completeness and consistency even though there is little chance that things like this happen.I am from R. And from the designing persepective, python including pandas has so many exceptions. For example, even
importing
a module has some egde case like thatimport time
never importstime.py
. I think it’s poor design problemhmm. this does make the result inconsistent with say
pandas.Series([1, 2, 3])[[0]]
A workflow could potentially have the index generated from unknown data. e.g as in the OP, from
value_counts
I think we should mark as a bug, pending furher investigation on how feasible it is is distinguish between a fancy indexer and a boolean indexer.