question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample, a copy-pastable example

import pandas as pd

# Create mock dataframe
df = pd.DataFrame({
    'a': [1, 2, 3, 4]
})
m = df['a'].eq(2)

new_df1 = df[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df1._is_copy)
# SettingWithCopyWarning (expected)
new_df1['b'] = 1

new_df2 = df.loc[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df2._is_copy)
# SettingWithCopyWarning (unexpected)
new_df2['c'] = 1

new_df3 = df.loc[m, :]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df3._is_copy)
# SettingWithCopyWarning (unexpected)
new_df3['d'] = 1

new_df4 = df.loc[m, df.columns]
# None
print(new_df4._is_copy)
# No SettingWithCopyWarning (expected?)
new_df4['e'] = 1

Problem description

I expect that df[m] would produce a weakref, but I don’t understand why I’m getting a weakref for the loc options where columns are not explicitly defined.

The issue being that:

new_df2 = df.loc[m]
new_df2['b'] = 1

and

new_df3 = df.loc[m, :]
new_df3['c'] = 1

Both warn:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)

I couldn’t find any documentation saying that columns must be explicitly defined when using loc to not produce a copy, although all of the examples in the docs on Why does assignment fail when using chained indexing? do explicitly declare the column or columns.

Expected Output

I would expect not to get a SettingWithCopyWarning especially since : works when slicing the Index:

filtered_df = df.loc[:, ['a']]
print(filtered_df._is_copy)  # None
filtered_df['b'] = 1  # No Warning

Output of pd.show_versions()

Confirmed on Windows

INSTALLED VERSIONS

commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.3.0 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.1 pip : None setuptools : None Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.22.0 pandas_datareader: 0.9.0 bs4 : 4.9.3 bottleneck : None fsspec : 2021.05.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : None tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None numba : None

Confirmed On Mac

INSTALLED VERSIONS

commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.2.final.0 python-bits : 64 OS : Darwin OS-release : 20.5.0 Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8

pandas : 1.3.0 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 54.2.0 Cython : 0.29.22 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Jul 25, 2021

Closing the sincd this is covered by the other issue

1reaction
HenryEckercommented, Jul 25, 2021

Oh. I understand now. loc returns a copy. The warning is to ensure you weren’t trying to update the original DataFrame. The bug is that explicitly selecting columns for some reason does not produce the warning.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No SettingWithCopyWarning for chained indexing when .loc ...
I am completely confused by this behavior. I would expect that all of these should trigger warnings as they are all using chained...
Read more >
SettingwithCopyWarning: How to Fix This Warning in Pandas
We have set a value on a copy of a slice but it was not detected by pandas – this is a false...
Read more >
Why does one use of iloc() give a SettingWithCopyWarning ...
Since there doesn't seem to be a graceful way of making assignments using integer position based indexing (i.e. .iloc ) without violating ...
Read more >
SettingWithCopyWarning in Pandas: Views vs Copies
The rules used by Pandas to determine whether or not you make a copy are very ... The assignment fails because df.loc[mask] returns...
Read more >
Explaining the SettingWithCopyWarning in pandas
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead. In...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found