Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Series.astype is unable to handle NaN

See original GitHub issue

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[2.0, np.nan, 1.0]})
df = df.astype(object)
def iif(a, b, c):
    if a:
        return b
    else:
        return c

df["a"].astype(object).apply(lambda r: iif(np.isnan(r),None,r)).astype(object).apply(lambda r: iif(np.isnan(r),None,r))
df

Issue Description

The result is:

0    2.0
1    NaN
2    1.0

Expected Behavior

The result should be:

0    2.0
1    None
2    1.0

I checked BUG: Replacing NaN with None in Pandas 1.3 does not work But it seems astype doesn’t work here.

Installed Versions

INSTALLED VERSIONS

commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.17763 machine : AMD64 processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.950

pandas : 1.4.1 numpy : 1.22.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.1.1 setuptools : 56.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.1.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : 1.4.32 tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

State:
Created 2 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

rhshadrachcommented, Jun 28, 2022

Thanks for the report! I wonder if apply is coercing the result into a numeric dtype. I would agree this is not expected, further investigations and PRs to fix are welcome!

0reactions

rhshadrachcommented, Aug 7, 2022

@rhshadrach provided this information, would you agree that this behavior of .apply() with convert_dtype=True is unexpected?

At current state - there are various places where currently we treat None as np.nan and this is consistent with that.

df = pd.DataFrame({'a': [1, 2, None]})
print(df)

#      a
# 0  1.0
# 1  2.0
# 2  NaN

df = pd.DataFrame({'a': [1, np.nan, None], 'b': [2, 3, 4]})
gb = df.groupby('a', dropna=False)
print(gb.sum())

#      b
# a     
# 1.0  2
# NaN  7

That said, I have a hunch that it would be better if pandas were to treat None as a Python object instead of np.nan. However, this issue needs more study for me to feel more certain of this.

Top Results From Across the Web

pandas - error using astype when NaN exists in a dataframe

If some values in column are missing ( NaN ) and then converted to numeric, always dtype is float . You cannot convert...

What's new in 1.5.0 (September 19, 2022) - Pandas

Bug in Series.astype() and DataFrame.astype() from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (GH45151).

How to Fix: ValueError: cannot convert float NaN to integer

The way to fix this error is to deal with the NaN values before attempting to convert the column from a float to...

What's New — pandas 0.23.4 documentation

Bug in Series.clip() and DataFrame.clip() cannot accept list-like threshold ... limit_area parameter to allow further control of which NaN s are replaced.

pandas convert object to float | The Search Engine You Control

Series.astype ... convert_objects(convert_numeric=True) 0 1 1 2 2 3 3 4 4 NaN dtype: float64 ... Pandas: ValueError: cannot convert float NaN to...