BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
# Create mock dataframe
df = pd.DataFrame({
'a': [1, 2, 3, 4]
})
m = df['a'].eq(2)
new_df1 = df[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df1._is_copy)
# SettingWithCopyWarning (expected)
new_df1['b'] = 1
new_df2 = df.loc[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df2._is_copy)
# SettingWithCopyWarning (unexpected)
new_df2['c'] = 1
new_df3 = df.loc[m, :]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df3._is_copy)
# SettingWithCopyWarning (unexpected)
new_df3['d'] = 1
new_df4 = df.loc[m, df.columns]
# None
print(new_df4._is_copy)
# No SettingWithCopyWarning (expected?)
new_df4['e'] = 1
Problem description
I expect that df[m]
would produce a weakref, but I don’t understand why I’m getting a weakref for the loc
options where columns are not explicitly defined.
The issue being that:
new_df2 = df.loc[m]
new_df2['b'] = 1
and
new_df3 = df.loc[m, :]
new_df3['c'] = 1
Both warn:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._set_item(key, value)
I couldn’t find any documentation saying that columns must be explicitly defined when using loc
to not produce a copy, although all of the examples in the docs on Why does assignment fail when using chained indexing? do explicitly declare the column or columns.
Expected Output
I would expect not to get a SettingWithCopyWarning
especially since :
works when slicing the Index:
filtered_df = df.loc[:, ['a']]
print(filtered_df._is_copy) # None
filtered_df['b'] = 1 # No Warning
Output of pd.show_versions()
Confirmed on Windows
INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.3.0 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.1 pip : None setuptools : None Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.22.0 pandas_datareader: 0.9.0 bs4 : 4.9.3 bottleneck : None fsspec : 2021.05.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : None tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None numba : None
Confirmed On Mac
INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.2.final.0 python-bits : 64 OS : Darwin OS-release : 20.5.0 Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8
pandas : 1.3.0 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 54.2.0 Cython : 0.29.22 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (8 by maintainers)
Closing the sincd this is covered by the other issue
Oh. I understand now.
loc
returns a copy. The warning is to ensure you weren’t trying to update the original DataFrame. The bug is that explicitly selecting columns for some reason does not produce the warning.