question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.copy(deep=True) is not a deep copy of the index

See original GitHub issue

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame(index=['a', 'b'], columns=['foo', 'muu'])
df1.index.name = "foo"
print(df1)

# create deep copy of df1 and change a value in the index
df2 = df1.copy(deep=True)
df2.index.name = "bar"
df2.index.values[0] = 'c'  # changes both df1 and df2

print(df1)
print(df2)

Problem description

DataFrame.copy(deep=True) is not a deep copy of the index.

In

https://github.com/pandas-dev/pandas/blob/a00154dcfe5057cb3fd86653172e74b6893e337d/pandas/core/indexes/base.py#L787

maybe deep should be set to True?

Expected Output

     foo  muu
foo          
a    NaN  NaN
b    NaN  NaN
     foo  muu
foo          
c    NaN  NaN
b    NaN  NaN
     foo  muu
bar          
c    NaN  NaN
b    NaN  NaN

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.1 scipy: 0.19.1 pyarrow: 0.8.0 xarray: 0.9.6 IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 0.9.8 lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.5.0

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:20 (12 by maintainers)

github_iconTop GitHub Comments

4reactions
h-vetinaricommented, Aug 13, 2018

IMO, copy(deep=True) should completely sever all connections between the original and the copied object - compare the official python docs (https://docs.python.org/3/library/copy.html):

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

So, IMO, deep=True should come to mean what deep='all' does currently (and the latter can then be removed).

Re:

Indexes are immutable. Changing its underlying data is going to cause all sorts of problems.

This is not a valid argument IMO - it’s up to me as a user (consenting adults and all…) what I do with my objects, including the indexes, and if I make a deep copy, it’s a justified expectation (I would even argue: a built-in expectation of the word “deep”) that this will not mess with the original.

Plus, if I’m already deep-copying the much larger values of a DF, not copying the index only saves a comparatively irrelevant amount of memory.

2reactions
skaaecommented, Feb 23, 2018

ok. I think the documentation of copy is unclear then: Make a deep copy, including a copy of the data and the indices.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.copy — pandas 1.5.2 documentation
While Index objects are copied when deep=True , the underlying numpy array is not copied for performance reasons. Since Index is immutable, the...
Read more >
Difference Between Shallow copy VS Deep copy in Pandas ...
DataFrame.copy(deep=True) for deep copy in DataFrames and Series. ... it doesn't copy the indices and the data of the original object but it ......
Read more >
Python pandas df.copy() ist not deep - Stack Overflow
A dict is a Python object and pandas does not copy them recursively. See the note in the documentation: "When deep=True, data is...
Read more >
How to make a copy of a data frame in pandas - Educative.io
Deep copy : It creates a new DataFrame with a copy of the data and indices of the given DataFrame. Changes to the...
Read more >
Equals (=) vs shallow copy vs deep copy in Pandas Dataframes
copy (deep=True) ), we are creating a copy that has its own data and index. So any modifications to the new DataFrame will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found