Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

groupby + shift drops group columns when as_index is False

See original GitHub issue

Using groupby + shift seems to have changed behaviour in 0.17 and 0.18 compared to 0.16. With as_index=False, I would expect the columns that the groupby is made over to remain in the output dataframe, but they are no longer present.

Code Sample, a copy-pastable example if possible

>>>import pandas as pd
>>>import numpy as np
>>>df = pd.DataFrame({'A': [1.0, 1, 1, 2, 2, 3], 'B': [1.0, 2, 3, 4, 5, 6]})
>>>df
     A    B
0  1.0  1.0
1  1.0  2.0
2  1.0  3.0
3  2.0  4.0
4  2.0  5.0
5  3.0  6.0
>>>df_sorted.groupby('A', as_index=False).shift(1)
     B
0  NaN
1  1.0
2  2.0
3  NaN
4  4.0
5  NaN

Expected Output

>>>pd.DataFrame({'A':[np.nan, 1, 1, np.nan, 2, np.nan], 'B':[np.nan, 1, 2, np.nan, 4, np.nan]})
     A    B
0  NaN  NaN
1  1.0  1.0
2  1.0  2.0
3  NaN  NaN
4  2.0  4.0
5  NaN  NaN

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

I have also confirmed the issue on an similar install Linux installation using pandas 0.18.1

Issue Analytics

State:
Created 7 years ago
Reactions:3
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

vladucommented, Oct 28, 2019

Ran into the same issue today. There is a workaround. If you reset_index before the groupby, then assign still works correctly after the shift. So, if you’re only trying to shift one column, you can do:

frm = frm.assign(shifted_val=frm.groupby(‘key’).shift(1)[‘val’])

Or, if you’re trying to shift the whole frame, you can assign the group col back:

shifted_frm = frm.groupby(‘key’).shift(1) shifted_frm = shifted_frm.assign(key=frm[‘key’])

1reaction

thoughtfuldatacommented, Mar 16, 2021

has there been any progress on this? I’m getting the same issue as @vladu