BUG: set_index on more than 1 column changes boolean values
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# create df with booleans and 0,1
df = pd.DataFrame({'group_a': [1,2,1,2], 'group_b': [0,1,True,False], 'value':range(4)})
print(df)
# True and False were changed
print(df.set_index(['group_a', 'group_b']))
# Problem doesn't happen if 0/1 are not present in the column
df = pd.DataFrame({'group_a': [1,2,1,2], 'group_b': [2,3,True,False], 'value':range(4)})
print(df)
print(df.set_index(['group_a', 'group_b']))
Issue Description
set_index on more than 1 column may change the values of booleans to integers/floats.
this issue only happens if (0, 0.0, 1, 1.0) are found in the same column as (True/False).
Expected Behavior
set_index
should not change values
Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.8.7.final.0 python-bits : 64 OS : Darwin OS-release : 20.4.0 Version : Darwin Kernel Version 20.4.0: Fri Mar 5 01:14:14 PST 2021; root:xnu-7195.101.1~3/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 1.4.2 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 21.3.1 setuptools : 58.3.0 Cython : 0.29.28 pytest : 6.2.5 hypothesis : None sphinx : None blosc : 1.10.6 feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : 8.3.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.3.0 gcsfs : None markupsafe : 2.1.1 matplotlib : 3.5.2 numba : 0.53.1 numexpr : 2.8.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 6.0.1 pyreadstat : None pyxlsb : None s3fs : 2022.02.0 scipy : 1.8.0 snappy : None sqlalchemy : None tables : 3.7.0 tabulate : 0.8.9 xarray : None xlrd : None xlwt : None zstandard : None
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
OK.
Would gladly help to find more places in which we have this behavior / implement a fix
I think this needs more discussion; my current thought is that while this behavior is undesirable, modifying it may be more undesirable.
Another example. Trying to index using a Boolean value is specifically handled by pandas, but float is not.