BUG: boolean indexing error with .drop()
See original GitHub issueCode Sample, a copy-pastable example if possible
df = pd.DataFrame( data = {
'acol' : np.arange(4),
'bcol' : 2*np.arange(4)
})
df.drop(df.bcol > 2, axis=0, inplace=True)
print(df)
Expected Output
acol bcol
0 0 0
1 1 2
Observed Output
acol bcol
2 2 4
3 3 6
4 4 8
Problem description
The anticipated behavior was that rows with bcol
> 2 would be dropped. The actual behavior is that the boolean gets converted to 0/1, and then treated as index label. So row numbers 0 and/or 1 are dropped… but all other rows will be kept.
The documentation did not make it clear what was happening.
Solutions might include documentation clarifying that .drop()
cannot be used with boolean indexing, or a warning when receiving the (attempted) boolean index.
Output of pd.show_versions()
pandas: 0.20.2 pytest: 3.1.2 pip: 9.0.1 setuptools: 33.1.1.post20170320 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.1 xarray: 0.9.6 IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.5.0a1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.8.0 bs4: 4.5.3 html5lib: 0.9999999 sqlalchemy: 1.1.11 pymysql: 0.7.9.None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.5 s3fs: 0.1.1 pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
Hi, I’m working on this issue.
Fair enough. I feel like this should just be allowed, but given the confusion it’s generated amongst users (two independent issues), I concede 😄