document is_copy
See original GitHub issueCode Sample, a copy-pastable example if possible
Problem description
I find the behavior of SettingWithCopyWarning
quite surprising, but I guess that’s just what it is.
It would be great if you could document is_copy
and how to use it, though.
Whenever any function returns a dataframe, it seems like it should make sure that is_copy
is set to False
(or None
?) so the user doesn’t get a warning if they change it - if you’re returning a dataframe, it’s unlikely that the user expects this to be a view, and you’re not doing chained assignments.
The is_copy
attribute has an empty docstring in the docs and I couldn’t find any explanation of it on the website (via google). The only think that told me that overwriting this attribute is actually the right thing to do (again, which is pretty weird to me), was https://github.com/pandas-dev/pandas/issues/6025#issuecomment-32904245
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
This was about
train_test_split
in sklearn, see https://github.com/scikit-learn/scikit-learn/issues/8723But basically the pattern is:
This is of course a contrived example, but I think the same applies whenever you have a library method that returns a sliced dataframe. If
copy
is the canonical solution, that’s fine.should do it. It just seems conceptually odd. If I understand the warning correctly, this means
df[df.A >= 0]
is copied twice, right? It warns me thatdf[df.A >= 0]
is a copy, and to get rid of that warning I copy it. (unless.copy()
doesn’t actually copy?).If df is on the order of magnitude of the free memory, doing an additional copy can mean not being able to work on certain datasets.
And regarding the deprecation, I’m not married to any method. I just want a canonical way to solve the issue I described above, ideally without making unnecessary copies. I phrased the issue the way I did because the only information I could find was https://github.com/pandas-dev/pandas/issues/6025#issuecomment-32904245, in which @jreback suggests using
is_copy
, so I thought this was the canonical way of doing this.Canoncially, this is very easy to work with, simply
.copy()
after a filter assignment. Agreed that this is not the most intuitive things, but there are many edge cases; copy-on-write fixes this but won’t be available in pandas1.use
.copy()
(or.assign()
)