Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Replacing NaN with None in Pandas 1.3 does not work

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

Pandas 1.2

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Pandas 1.3

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  NaN

Problem description

Replacing NaN values with None (or any other Python object) should work as in previous Pandas versions.

Expected Output

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.8.10.final.0 python-bits : 64 OS : Darwin OS-release : 21.0.0 Version : Darwin Kernel Version 21.0.0: Sun Jun 20 18:43:49 PDT 2021; root:xnu-8011.0.0.121.4~2/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : es_ES.UTF-8 LOCALE : es_ES.UTF-8

pandas : 1.3.0 numpy : 1.18.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 57.1.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.3.7 lxml.etree : 4.6.2 html5lib : None pymysql : None psycopg2 : 2.9.1 (dt dec pq3 ext lo64) jinja2 : 2.11.3 IPython : 7.20.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.3.3 sqlalchemy : 1.4.20 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : None

Issue Analytics

State:
Created 2 years ago
Reactions:6
Comments:10 (5 by maintainers)

Top GitHub Comments

10reactions

leromecommented, Jul 7, 2021

That was apparently changed on purpose: https://github.com/pandas-dev/pandas/pull/39761 but it’s indeed not very convenient, for example we often want to replace NaN by None just before converting the dataframe back to a dict (df.to_dict) or other data structure. I’m pretty sure that this sole change breaks many existing code bases.

1reaction

jbrockmendelcommented, Jul 9, 2021