Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

in consistent result with replace(np.nan, None, inplace=True)

See original GitHub issue

I’m trying to replace np.nan with None, so that I can query the parquet files from presto like is null or is not null.
I’ve done df.column_name.replace(np.nan, None, inplace=True)

Expected it to fill ‘nan’ with None. But, it will some of the columns with the value from columns where it is not nan.

But I couldn’t understand why it filled another additional fields, and why only some of the fields filled up why not all though it is not expected behaviour?

>>> data = [
... {'hello': 1, 'mad': 2, 'world': 3},
... {'mad': 2, 'world': 3},
... {'world': 1}
... ]
>>> df = pd.DataFrame(data)
>>> df
   hello  mad  world
0    1.0  2.0      3
1    NaN  2.0      3
2    NaN  NaN      1
>>> df.hello.dropna()
0    1.0
Name: hello, dtype: float64
>>> import numpy as np
>>> df.hello.replace(np.nan, None, inplace=True)
>>> df.hello
0    1.0
1    1.0
2    1.0
Name: hello, dtype: float64
>>>

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-1032-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.4.0 Cython: 0.26.1 numpy: 1.13.1 scipy: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: 0.1.2 pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:8 (1 by maintainers)

Top GitHub Comments

25reactions

CRiddlercommented, Sep 12, 2017

The issue you’re hitting is that None is the default argument for value, therefore when you use None as the argument, the replace method is falling back to the method argument. Which in this case is ‘pad’.

You can circumvent this behavior by passing the to_replace argument a dictionary that maps np.nan to None. Building on your example…

>>> data = [
... {'hello': 1, 'mad': 2, 'world': 3},
... {'mad': 2, 'world': 3},
... {'world': 1}
... ]
>>> 
>>> df = pd.DataFrame(data)
>>> df
   hello  mad  world
0    1.0  2.0      3
1    NaN  2.0      3
2    NaN  NaN      1


>>> df['hello'].replace(np.nan, None)
0    1.0
1    1.0
2    1.0

Note that with the above command we’re ACTUALLY telling pandas:

df['hello'].replace(np.nan, method='pad') which is why the value of 1 is being propagated.

Achieve desired result by passing a dictionary into replace.

>>> df['hello'].replace({np.nan:None})
0       1
1    None
2    None

6reactions

CRiddlercommented, Sep 12, 2017

Use df.where instead.

>>> df
    A    B
0  1.0  NaN
1  2.0  3.0
2  3.0  1.0
3  NaN  NaN
>>> df.where(pd.notnull(df), None)
    A     B
0     1  None
1     2     3
2     3     1
3  None  None

See: #1972 and https://stackoverflow.com/questions/14162723/replacing-pandas-or-numpy-nan-with-a-none-to-use-with-mysqldb