pd.DataFrame.__deepcopy__ is does not work when elements are lists (nested)
See original GitHub issueLooks like the pandas deepcopy is no longer fully recursive. For example if we have nested lists as shown below, the most inner lists are actually exactly the same.
>>> x = [[1, 2, 3, 4]]
>>> y = [[2, 3, 4, 5]]
>>> import pandas as pd
>>> df = pd.DataFrame({'x':x, 'y':y})
>>> import copy
>>> df2 = copy.deepcopy(df)
>>> id(df2['x'][0])
4572099912
>>> id(df['x'][0])
4572099912
We first noticed this issue on skbio when the copy unittests were failing. This hasn’t been a problem with the previous pandas release. Looking at the commits in the previous release, this one looks suspicious.
CC @gregcaporaso @ebolyen @jairideout
INSTALLED VERSIONS
commit: None python: 3.5.3.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:9 (5 by maintainers)
Top GitHub Comments
embedding mutable objects inside a. DataFrame is an antipattern
the community is welcome to contribute a patch