BUG: value_counts and nunique behave differently for NaN and None with dropna=False
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
s = pd.Series([None, np.nan, 'Y'])
s.nunique(dropna=False)
s.value_counts(dropna=False)
Problem description
nunique returns 3, but value_counts only has 2 rows. This may be related to Issue https://github.com/pandas-dev/pandas/issues/37566
Expected Output
I would expect these to be consistent.
Output of pd.show_versions()
pandas : 1.3.0 numpy : 1.19.5 pytz : 2019.3 dateutil : 2.8.1 pip : 21.1.2 setuptools : 56.0.0 Cython : 0.29.23 pytest : 6.2.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.24.1 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 0.8.5 fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.6 pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : 1.3.23 tables : None tabulate : None xarray : 0.17.0 xlrd : 2.0.1 xlwt : None numba : 0.52.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
He expects that both statements return the same, e.g. 3 I think since the nunqiue call seems correct
If the change is required in value_counts function to make it consistent with nuniques. Then can I take up on this issue and make a PR based on the code changes suggested by @realead ?. I am new to open source, and I would like to work on this issue.