Unpacking dictionary comprehension when using ``.assign()`` returns wrong results
See original GitHub issueCode Sample, a copy-pastable example if possible
df = pd.DataFrame({'A': [1, 2], 'B': [5, 6]})
df.assign(**{col + '_NEW': lambda x: x[col] * x['B'] for col in t.columns})
Problem description
Output contains the same result for all added columns:
A B A_NEW B_NEW
0 1 5 25 25
1 2 6 36 36
Expected Output
Doing the assign()
calls separately gives the expected output
(t.assign(A_NEW=lambda x: x['A'] * x['B'])
.assign(B_NEW=lambda x: X['B'] * x['B'])
)
A B A_NEW B_NEW
0 1 5 5 25
1 2 6 12 36
Output of pd.show_versions()
pandas: 0.22.0 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Comments
This method might be “un-pandorable,” but it seems like a good way to dynamically assign columns, particularly when they rely on other columns and you might not know the names of columns, etc.
If there’s another recommended method for doing what I’m trying to accomplish, I’m all ears (or eyes). Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Should have done that first! New version (0.23.3)
pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 40.0.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.7 patsy: None dateutil: 2.6.1 pytz: 2018.4 blosc: None bottleneck: None tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Still seems to give me the wrong results. I looked around the changelog, I’m assuming the section on dependent arguments to assign is what you’re referring to? If so, the example above doesn’t depend on columns generated in the call to
assign
, but rather columns that are already present.works properly, but
doesn’t. Printing
cols
in shell givesso maybe it has to do with the scope of lambda functions in dictionary comprehensions?
I see @TomAugspurger just replied, thanks to both of you for the quick responses!
I think this is different, Python’s late binding of closures: https://docs.python-guide.org/writing/gotchas/#late-binding-closures
IIUC, in your dict-comprehension,
col
is always going to be bound toB
. It’s referred to in thelambda
, but isn’t an argument.You might try something like