in consistent result with replace(np.nan, None, inplace=True)
See original GitHub issueI’m trying to replace np.nan with None, so that I can query the parquet files from presto like is null
or is not null
.
I’ve done
df.column_name.replace(np.nan, None, inplace=True)
Expected it to fill ‘nan’ with None. But, it will some of the columns with the value from columns where it is not nan.
But I couldn’t understand why it filled another additional fields, and why only some of the fields filled up why not all though it is not expected behaviour?
>>> data = [
... {'hello': 1, 'mad': 2, 'world': 3},
... {'mad': 2, 'world': 3},
... {'world': 1}
... ]
>>> df = pd.DataFrame(data)
>>> df
hello mad world
0 1.0 2.0 3
1 NaN 2.0 3
2 NaN NaN 1
>>> df.hello.dropna()
0 1.0
Name: hello, dtype: float64
>>> import numpy as np
>>> df.hello.replace(np.nan, None, inplace=True)
>>> df.hello
0 1.0
1 1.0
2 1.0
Name: hello, dtype: float64
>>>
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-1032-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.4.0 Cython: 0.26.1 numpy: 1.13.1 scipy: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: 0.1.2 pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (1 by maintainers)
Top GitHub Comments
The issue you’re hitting is that
None
is the default argument forvalue
, therefore when you useNone
as the argument, the replace method is falling back to themethod
argument. Which in this case is ‘pad’.You can circumvent this behavior by passing the
to_replace
argument a dictionary that mapsnp.nan
toNone
. Building on your example…Note that with the above command we’re ACTUALLY telling pandas:
df['hello'].replace(np.nan, method='pad')
which is why the value of 1 is being propagated.Achieve desired result by passing a dictionary into replace.
Use df.where instead.
See: #1972 and https://stackoverflow.com/questions/14162723/replacing-pandas-or-numpy-nan-with-a-none-to-use-with-mysqldb