question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unpacking dictionary comprehension when using ``.assign()`` returns wrong results

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'A': [1, 2], 'B': [5, 6]})
df.assign(**{col + '_NEW': lambda x: x[col] * x['B'] for col in t.columns})

Problem description

Output contains the same result for all added columns:

    A    B    A_NEW    B_NEW
0   1    5    25       25
1   2    6    36       36

Expected Output

Doing the assign() calls separately gives the expected output

(t.assign(A_NEW=lambda x: x['A'] * x['B'])
  .assign(B_NEW=lambda x: X['B'] * x['B'])
)
    A    B    A_NEW    B_NEW
0   1    5    5        25
1   2    6    12       36

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-6-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comments

This method might be “un-pandorable,” but it seems like a good way to dynamically assign columns, particularly when they rely on other columns and you might not know the names of columns, etc.

If there’s another recommended method for doing what I’m trying to accomplish, I’m all ears (or eyes). Thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
geoffrey-eisenbarthcommented, Jul 31, 2018

Should have done that first! New version (0.23.3)

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-6-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Still seems to give me the wrong results. I looked around the changelog, I’m assuming the section on dependent arguments to assign is what you’re referring to? If so, the example above doesn’t depend on columns generated in the call to assign, but rather columns that are already present.

df.assign(A_NEW=lambda x: x['A'] * x['B'], B_NEW=lambda x: x['B'] * x['B'])

works properly, but

cols = {col + '_NEW': lambda x: x[col] * x['B'] for col in df.columns}
df.assign(**cols)

doesn’t. Printing cols in shell gives

{'A_NEW': <function __main__.<dictcomp>.<lambda>,
 'B_NEW': <function __main__.<dictcomp>.<lambda>}

so maybe it has to do with the scope of lambda functions in dictionary comprehensions?

I see @TomAugspurger just replied, thanks to both of you for the quick responses!

1reaction
TomAugspurgercommented, Jul 31, 2018

I think this is different, Python’s late binding of closures: https://docs.python-guide.org/writing/gotchas/#late-binding-closures

IIUC, in your dict-comprehension, col is always going to be bound to B. It’s referred to in the lambda, but isn’t an argument.

In [57]: funcs = {col + '_NEW': lambda x: x[col] * x['B'] for col in df.columns}

In [58]: funcs['A_NEW'](df)
Out[58]:
0    25
1    36
Name: B, dtype: int64

You might try something like

In [40]: def f(x):
    ...:     return x * df.B
    ...:
    ...:

In [41]: df.assign(**{col.name +'_NEW': f(col) for _, col in df.items()})
Out[41]:
   A  B  A_NEW  B_NEW
0  1  5      5     25
1  2  6     12     36
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python - tuple unpacking in dict comprehension - Stack Overflow
In your code, s.split('=') will return the list: ['A', '5'] . When iterating over that list, a single string gets returned each time...
Read more >
ValueError: too many values to unpack (expected 2)
Unpacking refers to retrieving values from a list and assigning them to a list of variables. This error occurs when the number of...
Read more >
5. Data Structures — Python 3.11.1 documentation
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of...
Read more >
Python valueerror: too many values to unpack (expected 2)
We solve this error by using a method called items(). This method analyzes a dictionary and returns keys and values stored as tuples....
Read more >
Python Dictionary Comprehension Tutorial - DataCamp
What you now deal with is a "key-value" pair, which is sometimes a more appropriate data structure for many problem instead of a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found