question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Series.astype is unable to handle NaN

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[2.0, np.nan, 1.0]})
df = df.astype(object)
def iif(a, b, c):
    if a:
        return b
    else:
        return c

df["a"].astype(object).apply(lambda r: iif(np.isnan(r),None,r)).astype(object).apply(lambda r: iif(np.isnan(r),None,r))
df

Issue Description

The result is:

0    2.0
1    NaN
2    1.0

Expected Behavior

The result should be:

0    2.0
1    None
2    1.0

I checked BUG: Replacing NaN with None in Pandas 1.3 does not work But it seems astype doesn’t work here.

Installed Versions

INSTALLED VERSIONS

commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.17763 machine : AMD64 processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.950

pandas : 1.4.1 numpy : 1.22.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.1.1 setuptools : 56.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.1.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : 1.4.32 tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rhshadrachcommented, Jun 28, 2022

Thanks for the report! I wonder if apply is coercing the result into a numeric dtype. I would agree this is not expected, further investigations and PRs to fix are welcome!

0reactions
rhshadrachcommented, Aug 7, 2022

@rhshadrach provided this information, would you agree that this behavior of .apply() with convert_dtype=True is unexpected?

At current state - there are various places where currently we treat None as np.nan and this is consistent with that.

df = pd.DataFrame({'a': [1, 2, None]})
print(df)

#      a
# 0  1.0
# 1  2.0
# 2  NaN

df = pd.DataFrame({'a': [1, np.nan, None], 'b': [2, 3, 4]})
gb = df.groupby('a', dropna=False)
print(gb.sum())

#      b
# a     
# 1.0  2
# NaN  7

That said, I have a hunch that it would be better if pandas were to treat None as a Python object instead of np.nan. However, this issue needs more study for me to feel more certain of this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas - error using astype when NaN exists in a dataframe
If some values in column are missing ( NaN ) and then converted to numeric, always dtype is float . You cannot convert...
Read more >
What's new in 1.5.0 (September 19, 2022) - Pandas
Bug in Series.astype() and DataFrame.astype() from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (GH45151).
Read more >
How to Fix: ValueError: cannot convert float NaN to integer
The way to fix this error is to deal with the NaN values before attempting to convert the column from a float to...
Read more >
What's New — pandas 0.23.4 documentation
Bug in Series.clip() and DataFrame.clip() cannot accept list-like threshold ... limit_area parameter to allow further control of which NaN s are replaced.
Read more >
pandas convert object to float | The Search Engine You Control
Series.astype ... convert_objects(convert_numeric=True) 0 1 1 2 2 3 3 4 4 NaN dtype: float64 ... Pandas: ValueError: cannot convert float NaN to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found