question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

groupby + shift drops group columns when as_index is False

See original GitHub issue

Using groupby + shift seems to have changed behaviour in 0.17 and 0.18 compared to 0.16. With as_index=False, I would expect the columns that the groupby is made over to remain in the output dataframe, but they are no longer present.

Code Sample, a copy-pastable example if possible

>>>import pandas as pd
>>>import numpy as np
>>>df = pd.DataFrame({'A': [1.0, 1, 1, 2, 2, 3], 'B': [1.0, 2, 3, 4, 5, 6]})
>>>df
     A    B
0  1.0  1.0
1  1.0  2.0
2  1.0  3.0
3  2.0  4.0
4  2.0  5.0
5  3.0  6.0
>>>df_sorted.groupby('A', as_index=False).shift(1)
     B
0  NaN
1  1.0
2  2.0
3  NaN
4  4.0
5  NaN

Expected Output

>>>pd.DataFrame({'A':[np.nan, 1, 1, np.nan, 2, np.nan], 'B':[np.nan, 1, 2, np.nan, 4, np.nan]})
     A    B
0  NaN  NaN
1  1.0  1.0
2  1.0  2.0
3  NaN  NaN
4  2.0  4.0
5  NaN  NaN

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

I have also confirmed the issue on an similar install Linux installation using pandas 0.18.1

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:3
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
vladucommented, Oct 28, 2019

Ran into the same issue today. There is a workaround. If you reset_index before the groupby, then assign still works correctly after the shift. So, if you’re only trying to shift one column, you can do:

frm = frm.assign(shifted_val=frm.groupby(‘key’).shift(1)[‘val’])

Or, if you’re trying to shift the whole frame, you can assign the group col back:

shifted_frm = frm.groupby(‘key’).shift(1) shifted_frm = shifted_frm.assign(key=frm[‘key’])

1reaction
thoughtfuldatacommented, Mar 16, 2021

has there been any progress on this? I’m getting the same issue as @vladu

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using groupby, shift and rolling in Pandas - Stack Overflow
The problem is that shift() shifts the data from previous groups which makes first row in group 2 and 3 incorrect. Column 'ma'...
Read more >
pandas.DataFrame.groupby
If False: show all values for categorical groupers. If True, and if group keys contain NA values, NA values together with row/column will...
Read more >
All Pandas groupby() You Should Know for Grouping Data ...
The argument is to configure whether the index is group labels or not. If it is set to False , the group labels...
Read more >
Comparison with pandas - datatable documentation
In pandas every frame has a row index, and if a filtration is executed, ... df = df.drop('B', axis=1) ... Group by column...
Read more >
Pandas Groupby Sort within Groups - Spark by {Examples}
You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found