BUG: Replacing NaN with None in Pandas 1.3 does not work
See original GitHub issue- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Pandas 1.2
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
0
0 0.5
1 None
Pandas 1.3
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
0
0 0.5
1 NaN
Problem description
Replacing NaN values with None (or any other Python object) should work as in previous Pandas versions.
Expected Output
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
0
0 0.5
1 None
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.8.10.final.0 python-bits : 64 OS : Darwin OS-release : 21.0.0 Version : Darwin Kernel Version 21.0.0: Sun Jun 20 18:43:49 PDT 2021; root:xnu-8011.0.0.121.4~2/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : es_ES.UTF-8 LOCALE : es_ES.UTF-8
pandas : 1.3.0 numpy : 1.18.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 57.1.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.3.7 lxml.etree : 4.6.2 html5lib : None pymysql : None psycopg2 : 2.9.1 (dt dec pq3 ext lo64) jinja2 : 2.11.3 IPython : 7.20.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.3.3 sqlalchemy : 1.4.20 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Reactions:6
- Comments:10 (5 by maintainers)
That was apparently changed on purpose: https://github.com/pandas-dev/pandas/pull/39761 but it’s indeed not very convenient, for example we often want to replace NaN by None just before converting the dataframe back to a dict (df.to_dict) or other data structure. I’m pretty sure that this sole change breaks many existing code bases.
This is the right way to do this.
The 1.2 behavior referenced by the OP also converts to object, so if you want None in your Series, there is no way to do this.