Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unpacking dictionary comprehension when using ``.assign()`` returns wrong results

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'A': [1, 2], 'B': [5, 6]})
df.assign(**{col + '_NEW': lambda x: x[col] * x['B'] for col in t.columns})

Problem description

Output contains the same result for all added columns:

    A    B    A_NEW    B_NEW
0   1    5    25       25
1   2    6    36       36

Expected Output

Doing the assign() calls separately gives the expected output

(t.assign(A_NEW=lambda x: x['A'] * x['B'])
  .assign(B_NEW=lambda x: X['B'] * x['B'])
)
    A    B    A_NEW    B_NEW
0   1    5    5        25
1   2    6    12       36

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-6-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comments

This method might be “un-pandorable,” but it seems like a good way to dynamically assign columns, particularly when they rely on other columns and you might not know the names of columns, etc.

If there’s another recommended method for doing what I’m trying to accomplish, I’m all ears (or eyes). Thanks!

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

geoffrey-eisenbarthcommented, Jul 31, 2018

Should have done that first! New version (0.23.3)

pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Still seems to give me the wrong results. I looked around the changelog, I’m assuming the section on dependent arguments to assign is what you’re referring to? If so, the example above doesn’t depend on columns generated in the call to assign, but rather columns that are already present.

df.assign(A_NEW=lambda x: x['A'] * x['B'], B_NEW=lambda x: x['B'] * x['B'])

works properly, but

cols = {col + '_NEW': lambda x: x[col] * x['B'] for col in df.columns}
df.assign(**cols)

doesn’t. Printing cols in shell gives

{'A_NEW': <function __main__.<dictcomp>.<lambda>,
 'B_NEW': <function __main__.<dictcomp>.<lambda>}

so maybe it has to do with the scope of lambda functions in dictionary comprehensions?

I see @TomAugspurger just replied, thanks to both of you for the quick responses!

1reaction

TomAugspurgercommented, Jul 31, 2018

I think this is different, Python’s late binding of closures: https://docs.python-guide.org/writing/gotchas/#late-binding-closures

IIUC, in your dict-comprehension, col is always going to be bound to B. It’s referred to in the lambda, but isn’t an argument.

In [57]: funcs = {col + '_NEW': lambda x: x[col] * x['B'] for col in df.columns}

In [58]: funcs['A_NEW'](df)
Out[58]:
0    25
1    36
Name: B, dtype: int64

You might try something like

In [40]: def f(x):
    ...:     return x * df.B
    ...:
    ...:

In [41]: df.assign(**{col.name +'_NEW': f(col) for _, col in df.items()})
Out[41]:
   A  B  A_NEW  B_NEW
0  1  5      5     25
1  2  6     12     36