DataFrame.replace() overwrites when values are non-numeric
See original GitHub issueCode Sample, a copy-pastable example if possible
In [1]: pd.Series([1,2,3]).replace({1 : 2, 2 : 3, 3 : 4})
Out[1]:
0 2
1 3
2 4
dtype: int64
In [2]: pd.Series(['1','2','3']).replace({'1' : '2', '2' : '3', '3' : '4'})
Out[2]:
0 4
1 4
2 4
dtype: object
Problem description
I’d expect the replacement over values in a dataframe to be non-transitive. Suppose that we would like to replace a
with b
, and b
with c
. When this replacement is applied to an entry containing the value a
, replacement rules are propagated and therefore c
is returned instead of b
. Same replacement is not transitive (as shown in example code) for numeric values.
I think this default behavior should be mentioned explicitly in the documentation. It would also be nice to have a Boolean option to set the transitivity on/off.
Expected Output
Out[2]:
0 2
1 3
2 4
dtype: object
Output of pd.show_versions()
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.1 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.3.1 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.6.4 bs4: 4.5.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
xref #5541, #5338
Yeah, I’d consider this more a bug than intended behavior. Deep in the replace code, if
dtype=object
is being replaced on, a recursive path is used, not entirely sure why, but probably could be changed to to do only 1 pass like you’re expecting.https://github.com/pandas-dev/pandas/blob/2522efa9e687e777d966f49af70b325922699bea/pandas/core/internals.py#L3271