REGR: Series repr of object Index with bools and NaN is wrong
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
# a series with booleans and nan
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False)
True 2
False 1
True 1
dtype: int64
# check the actual index of the value_counts result
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False).index
Index([True, False, nan], dtype='object')
# a similar Serie, with ints instead of nans, seems to work ok
pd.Series([0,1,1,pd.NA]).value_counts(dropna=False)
1.0 2
0.0 1
NaN 1
dtype: int64
Problem description
As shown in the example, the repr of the value_counts result is apparently wrong when booleans are in the serie. It should report nan as a possible value, instead it maps it to True. Note that this is limited to series with boolean. A similar example with ints works ok.
Output of pd.show_versions()
pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : None IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
And the bug should be somewhere in here:
Not sure if this is related but it seems similar:
numpy.nan
shows asTrue
in index.This
True
is different from the regularTrue
in the value count:But the correct value is still used “under the hood”: