BUG: Series.astype is unable to handle NaN
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[2.0, np.nan, 1.0]})
df = df.astype(object)
def iif(a, b, c):
if a:
return b
else:
return c
df["a"].astype(object).apply(lambda r: iif(np.isnan(r),None,r)).astype(object).apply(lambda r: iif(np.isnan(r),None,r))
df
Issue Description
The result is:
0 2.0
1 NaN
2 1.0
Expected Behavior
The result should be:
0 2.0
1 None
2 1.0
I checked BUG: Replacing NaN with None in Pandas 1.3 does not work But it seems astype doesn’t work here.
Installed Versions
INSTALLED VERSIONS
commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.17763 machine : AMD64 processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.950
pandas : 1.4.1 numpy : 1.22.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.1.1 setuptools : 56.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.1.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : 1.4.32 tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Thanks for the report! I wonder if apply is coercing the result into a numeric dtype. I would agree this is not expected, further investigations and PRs to fix are welcome!
At current state - there are various places where currently we treat
None
asnp.nan
and this is consistent with that.That said, I have a hunch that it would be better if pandas were to treat
None
as a Python object instead ofnp.nan
. However, this issue needs more study for me to feel more certain of this.