Incorrect boolean bitwise conversion when dtype=object
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
s = pd.Series([False, True, None])
s[s.isnull()] = False
print('incorrect:')
print((~s).astype(bool))
print('\ncorrect:')
print(s.apply(lambda x: not x))
Problem description
The above code will produce the following unexpected output:
incorrect:
0 True
1 True
2 True
dtype: bool
correct:
0 True
1 False
2 True
dtype: bool
The same issues persist with using np.nan
or other NULL-like. NULL-like are commonly encountered in data. It is very convenient and common for data scientists to use bitwise operators like |, &, and ~ to slice and dice pandas Series and DataFrames. Bitwise operator ~ returns unexpected behavior when some data has been NULL in the past (implicitly causing the dtype to become object), even after coercion of the offending elements to bool. This can cause subtle problems in data filtering that will go undetected. If it is not fixed, use of bitwise operators on dtype=object
should at least raise a Warning.
Expected Output
0 True
1 False
2 True
dtype: bool
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.4
pytz : 2018.7
dateutil : 2.7.5
pip : 19.3.1
setuptools : 41.6.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.0.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.3
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:9 (7 by maintainers)
Top Results From Across the Web
Caveats and Gotchas — pandas 0.15.2 documentation
pandas follows the numpy convention of raising an error when you try to convert something to a bool. This happens in a if...
Read more >Effect of a Bitwise Operator on a Boolean in Java
They are logical operators when the operands are boolean, and their behaviour in the latter case is specified.
Read more >boolean list python - Aesthetica Beauty Center
How to convert a False or True boolean values inside nested python dictionary ... Logical and Bitwise Operations Section 1: Making decisions in...
Read more >Convert True/False Boolean to 1/0 Dummy Integer in pandas ...
On this page, I'll illustrate how to convert a True/False boolean column to a 1/0 integer dummy in a pandas DataFrame in the...
Read more >lnt-logical-bitwise-mismatch - Microsoft Learn
A logical operator was used with integer values or a bitwise operator was used with Boolean values. While the flagged code still compiles,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I don’t think that’s possible, since object dtype can include anything, including Python objects where
~
is perfectly sensible.This would be fixed by https://github.com/pandas-dev/pandas/pull/23313, and implementing
BooleanArray.__invert__
on the new boolean array.Seems reasonable to me, let’s see what the core team thinks (I’m relatively new to issue triage). Could we just add a warning to
__invert__
frompandas.core.generic.py
in the case thatself
is an object?