question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ambiguous behaviour when index is bool type?

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

pandas.Series([True, True, False]).value_counts()[[False]]

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

pandas/core/indexers.py in check_array_indexer(array, indexer)
    468         # GH26658
    469         if len(indexer) != len(array):
--> 470             raise IndexError(
    471                 f"Boolean index has wrong length: "
    472                 f"{len(indexer)} instead of {len(array)}"

IndexError: Boolean index has wrong length: 1 instead of 2

It seems that Series.__getitem__ and check_bool_indexer do not work well for bool-typed index

Expected Output

<pandas.Series>
False    1
dtype: int64

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.8.5.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.1.4 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.3 setuptools : 49.6.0.post20200814 Cython : None pytest : None hypothesis : None sphinx : 3.2.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.18.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kwhkimcommented, Aug 31, 2021

I am aware of this edge case. As far as I figured out, a list consisted only of True and Falses works as boolean values(not index).

In this case, I would do

x = pandas.Series([True, True, False]).value_counts()
x[ x.index == False]

or you can slip in some value other than True or False

pandas.Series([True, True, False]).value_counts().append(pandas.Series([0], index = ['None']), verify_integrity=False)[[True, 'None']]

I think there should be something like .ibool for boolean indexing for completeness and consistency even though there is little chance that things like this happen.

I am from R. And from the designing persepective, python including pandas has so many exceptions. For example, even importing a module has some egde case like that import time never imports time.py. I think it’s poor design problem

1reaction
simonjayhawkinscommented, Aug 25, 2021

There are far more practical uses for an Indexer with Boolean values than trying to index True/False indexed DataFrame.

hmm. this does make the result inconsistent with say pandas.Series([1, 2, 3])[[0]]

A workflow could potentially have the index generated from unknown data. e.g as in the OP, from value_counts

I think we should mark as a bug, pending furher investigation on how feasible it is is distinguish between a fancy indexer and a boolean indexer.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Explanation of boolean indexing behaviors - Stack Overflow
1 Answer 1 ... So if an index is a scalar boolean, a new axis is added. If the value is True the...
Read more >
pandas.merge — pandas 1.5.2 documentation
If joining columns on columns, the DataFrame indexes will be ignored. ... This is different from usual SQL join behaviour and can lead...
Read more >
The Go Programming Language Specification
A boolean type represents the set of Boolean truth values denoted by the predeclared constants true and false . The predeclared boolean type...
Read more >
3. Strings, lists, and tuples — Beginning Python Programming ...
Strings, lists, and tuples are all sequence types, so called because they behave like ... The expression inside brackets is called the index,...
Read more >
The truth value of an array with more than one element is ...
The truth value of an array with more than one element is ambiguous. ... That x<5 is not actually a boolean value, but...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found