question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: boolean indexing error with .drop()

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.DataFrame( data = {
                         'acol'  : np.arange(4),
                         'bcol' :  2*np.arange(4)
                        })
df.drop(df.bcol > 2, axis=0, inplace=True)

print(df)

Expected Output

	acol	bcol
0	0	0
1	1	2

Observed Output

	acol	bcol
2	2	4
3	3	6
4	4	8

Problem description

The anticipated behavior was that rows with bcol > 2 would be dropped. The actual behavior is that the boolean gets converted to 0/1, and then treated as index label. So row numbers 0 and/or 1 are dropped… but all other rows will be kept.

The documentation did not make it clear what was happening.

Solutions might include documentation clarifying that .drop() cannot be used with boolean indexing, or a warning when receiving the (attempted) boolean index.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-573.12.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2 pytest: 3.1.2 pip: 9.0.1 setuptools: 33.1.1.post20170320 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.1 xarray: 0.9.6 IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.5.0a1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.8.0 bs4: 4.5.3 html5lib: 0.9999999 sqlalchemy: 1.1.11 pymysql: 0.7.9.None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.5 s3fs: 0.1.1 pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
andrejonassoncommented, Jul 15, 2017

Hi, I’m working on this issue.

0reactions
gfyoungcommented, Jul 14, 2017

Fair enough. I feel like this should just be allowed, but given the confusion it’s generated amongst users (two independent issues), I concede 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas boolean indexing error with .drop() - Stack Overflow
It seems like a bug... but perhaps I'm missing something? df = pd.DataFrame( data = { 'acol' : [1.0, 3.00, 11.0, ...
Read more >
Indexing and selecting data — pandas 1.5.2 documentation
.loc is primarily label based, but may also be used with a boolean array. .loc will raise ... Series([1, 2, 3], index=list('abc')) In...
Read more >
Boolean Indexing in Pandas - GeeksforGeeks
Boolean indexing is a type of indexing that uses actual values of the data in the DataFrame. In boolean indexing, we can filter...
Read more >
4 Subsetting | Advanced R
The default drop = TRUE behaviour is a common source of bugs in functions: you check your code with a data frame or...
Read more >
Part 3 - Introduction to Pandas | ArcGIS API for Python
read_csv() function can be used to read csv (comma-separated value) files. ... 0, 38070, BUG: Index.drop raising Error when Index has d.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found