question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

in consistent result with replace(np.nan, None, inplace=True)

See original GitHub issue

I’m trying to replace np.nan with None, so that I can query the parquet files from presto like is null or is not null.
I’ve done df.column_name.replace(np.nan, None, inplace=True)

Expected it to fill ‘nan’ with None. But, it will some of the columns with the value from columns where it is not nan.

But I couldn’t understand why it filled another additional fields, and why only some of the fields filled up why not all though it is not expected behaviour?

>>> data = [
... {'hello': 1, 'mad': 2, 'world': 3},
... {'mad': 2, 'world': 3},
... {'world': 1}
... ]
>>> df = pd.DataFrame(data)
>>> df
   hello  mad  world
0    1.0  2.0      3
1    NaN  2.0      3
2    NaN  NaN      1
>>> df.hello.dropna()
0    1.0
Name: hello, dtype: float64
>>> import numpy as np
>>> df.hello.replace(np.nan, None, inplace=True)
>>> df.hello
0    1.0
1    1.0
2    1.0
Name: hello, dtype: float64
>>>

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-1032-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.4.0 Cython: 0.26.1 numpy: 1.13.1 scipy: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: 0.1.2 pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

25reactions
CRiddlercommented, Sep 12, 2017

The issue you’re hitting is that None is the default argument for value, therefore when you use None as the argument, the replace method is falling back to the method argument. Which in this case is ‘pad’.

You can circumvent this behavior by passing the to_replace argument a dictionary that maps np.nan to None. Building on your example…

>>> data = [
... {'hello': 1, 'mad': 2, 'world': 3},
... {'mad': 2, 'world': 3},
... {'world': 1}
... ]
>>> 
>>> df = pd.DataFrame(data)
>>> df
   hello  mad  world
0    1.0  2.0      3
1    NaN  2.0      3
2    NaN  NaN      1


>>> df['hello'].replace(np.nan, None)
0    1.0
1    1.0
2    1.0

Note that with the above command we’re ACTUALLY telling pandas:

df['hello'].replace(np.nan, method='pad') which is why the value of 1 is being propagated.

Achieve desired result by passing a dictionary into replace.

>>> df['hello'].replace({np.nan:None})
0       1
1    None
2    None
6reactions
CRiddlercommented, Sep 12, 2017

Use df.where instead.

>>> df
    A    B
0  1.0  NaN
1  2.0  3.0
2  3.0  1.0
3  NaN  NaN
>>> df.where(pd.notnull(df), None)
    A     B
0     1  None
1     2     3
2     3     1
3  None  None

See: #1972 and https://stackoverflow.com/questions/14162723/replacing-pandas-or-numpy-nan-with-a-none-to-use-with-mysqldb

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Replacing Pandas or Numpy Nan with a None to use ...
MysqlDB doesn't seem understand 'nan' and my database throws out an error saying nan is not in the field list. I need to...
Read more >
Working with missing data — pandas 1.5.2 documentation
One has to be mindful that in Python (and NumPy), the nan's don't compare equal, but None's do. Note that pandas/NumPy uses the...
Read more >
Pandas Replace NaN with Blank/Empty String
By using replace() or fillna() methods you can replace NaN values with Blank/Empty string in Pandas DataFrame. NaN stands for Not A Number...
Read more >
Pandas: Replace NaNs with the value from the previous row ...
Write a Pandas program to replace NaNs with the value from the previous ... as pd import numpy as np pd.set_option('display.max_rows', None) ...
Read more >
What is the difference between NaN ,None, pd.nan and np.nan?
NaN is used as a placeholder for missing data consistently in pandas, consistency is good. ... df1 = df.astype(object).replace(np.nan, 'None').
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found