BUG: Summing a series of bools doesn't return a number
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.Series([True, False], dtype="object").cumsum()
Issue Description
Certain mathematical operations (e.g., sum, prod) on a series with a single bool that has an object dtype return a bool instead of an int. Not all operations do this (e.g., mean, median).
>>> pd.Series([True], dtype="object").sum()
True
>>> pd.Series([True, False], dtype="object").sum()
1
>>> pd.Series([True], dtype="object").mean()
1.0
Cumsum and cumprod are a bit more interesting, as the first element in the resulting series is a bool.
>>> pd.Series([True, False], dtype="object").cumsum()
0 True
1 1
dtype: object
Expected Behavior
I expect the results to be the same as when the dtype is bool, e.g.,
>>> pd.Series([True, False]).cumsum()
0 1
1 1
dtype: int64
Installed Versions
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.9.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-125-generic Version : #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_CA.UTF-8 LOCALE : en_CA.UTF-8
pandas : 1.4.3 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 setuptools : 59.4.0 pip : 21.3.1 Cython : None pytest : None hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 7.33.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None markupsafe : 2.1.1 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
@mzeitlin11 Nullable booleans do seem to be what I want. Thanks for the advice. I’m new to pandas, so it’s much appreciated.
@marberts going to close then to keep future discussion around the other issues mentioned. If you run into any issues using nullable booleans, feel free to open a new issue of course!