Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enabling chained assignment checks (SettingWithCopyWarning) can have huge performance impact

See original GitHub issue

Similar to an observation on reddit I noticed that there is a huge performance difference between the default pandas pd.options.mode.chained_assignment = 'warn' over setting it to None.

Code Sample

import time
import pandas as pd
import numpy as np

def gen_data(N=10000):
    df = pd.DataFrame(index=range(N))
    for c in range(10):
        df[str(c)] = np.random.uniform(size=N)
    df["id"] = np.random.choice(range(500), size=len(df))
    return df

def do_something_on_df(df):
    """ Dummy computation that contains inplace mutations """
    for c in range(df.shape[1]):
        df[str(c)] = np.random.uniform(size=df.shape[0])
    return 42

def run_test(mode="warn"):
    pd.options.mode.chained_assignment = mode

    df = gen_data()

    t1 = time.time()
    for key, group_df in df.groupby("id"):
        do_something_on_df(group_df)
    t2 = time.time()
    print("Runtime: {:10.3f} sec".format(t2 - t1))

if __name__ == "__main__":
    run_test(mode="warn")
    run_test(mode=None)

Problem description

The run times vary a lot depending on the whether the SettingWithCopyWarning is enabled or disable. I tried with a few different Pandas/Python versions:

Debian VM, Python 3.6.2, pandas 0.21.0
Runtime:     46.693 sec
Runtime:      0.731 sec

Debian VM, Python 2.7.9, pandas 0.20.0
Runtime:    101.204 sec
Runtime:      0.622 sec

Ubuntu (host), Python 2.7.3, pandas 0.21.0
Runtime:     35.363 sec
Runtime:      0.517 sec

Ideally, there should not be such a big penalty for SettingWithCopyWarning.

From profiling results it looks like the reason might be this call to gc.collect.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-4-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: None numpy: 1.13.3 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

5reactions

bluenote10commented, Dec 12, 2017

It would probably be helpful to document the performance impact more clearly. This can have subtle side effects, which are very hard to find. I only noticed it, because a Dask/Distributed computation was much slower than expected (use case documented on SO)

4reactions

jrebackcommented, Dec 12, 2017

of course, this has to run the garbage collector. You can certainly just disable them. This wont’ be fixed in pandas 2.

Top Results From Across the Web

SettingWithCopyWarning in Pandas: Views vs Copies

The rules used by Pandas to determine whether or not you make a copy are very complex. Fortunately, there are some straightforward ways...

3 ways to deal with SettingWithCopyWarning in Pandas

The problem of chained assignment can be tackled easily by combining the back-to-back indexing operations into a single operation using .loc.

How to deal with SettingWithCopyWarning in Pandas

The SettingWithCopyWarning was created to flag potentially confusing "chained" assignments, such as the following, which does not always ...

SettingwithCopyWarning: How to Fix This Warning in Pandas

Chained assignment is the combination of chaining and assignment. Let's take a quick look at an example with the data set we loaded...

SettingWithCopyWarning in pandas - Towards Data Science

Chain indexing inherently is not a problem, but assigning values using chained indexing, i.e. chained assignment , can be. Depending on the situation,...