groupby + shift drops group columns when as_index is False
See original GitHub issueUsing groupby + shift seems to have changed behaviour in 0.17 and 0.18 compared to 0.16. With as_index=False, I would expect the columns that the groupby is made over to remain in the output dataframe, but they are no longer present.
Code Sample, a copy-pastable example if possible
>>>import pandas as pd
>>>import numpy as np
>>>df = pd.DataFrame({'A': [1.0, 1, 1, 2, 2, 3], 'B': [1.0, 2, 3, 4, 5, 6]})
>>>df
A B
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 4.0
4 2.0 5.0
5 3.0 6.0
>>>df_sorted.groupby('A', as_index=False).shift(1)
B
0 NaN
1 1.0
2 2.0
3 NaN
4 4.0
5 NaN
Expected Output
>>>pd.DataFrame({'A':[np.nan, 1, 1, np.nan, 2, np.nan], 'B':[np.nan, 1, 2, np.nan, 4, np.nan]})
A B
0 NaN NaN
1 1.0 1.0
2 1.0 2.0
3 NaN NaN
4 2.0 4.0
5 NaN NaN
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
I have also confirmed the issue on an similar install Linux installation using pandas 0.18.1
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Using groupby, shift and rolling in Pandas - Stack Overflow
The problem is that shift() shifts the data from previous groups which makes first row in group 2 and 3 incorrect. Column 'ma'...
Read more >pandas.DataFrame.groupby
If False: show all values for categorical groupers. If True, and if group keys contain NA values, NA values together with row/column will...
Read more >All Pandas groupby() You Should Know for Grouping Data ...
The argument is to configure whether the index is group labels or not. If it is set to False , the group labels...
Read more >Comparison with pandas - datatable documentation
In pandas every frame has a row index, and if a filtration is executed, ... df = df.drop('B', axis=1) ... Group by column...
Read more >Pandas Groupby Sort within Groups - Spark by {Examples}
You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ran into the same issue today. There is a workaround. If you reset_index before the groupby, then assign still works correctly after the shift. So, if you’re only trying to shift one column, you can do:
frm = frm.assign(shifted_val=frm.groupby(‘key’).shift(1)[‘val’])
Or, if you’re trying to shift the whole frame, you can assign the group col back:
shifted_frm = frm.groupby(‘key’).shift(1) shifted_frm = shifted_frm.assign(key=frm[‘key’])
has there been any progress on this? I’m getting the same issue as @vladu