Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper
See original GitHub issueCode Sample, a copy-pastable example if possible
data = pd.read_stata("myfile.dta")
data = data.set_index(['country', 'year'])
data_delta = data.groupby('count').diff()
Problem description
Hi everyone! My first bug report 😃
I’m having some problems with the .diff() argument, and first thought I was just being an idiot, but now I’m fairly confident I’ve isolated the bug.
Note, when I run this manually line-by-line it works fine, but I depend on this being inside a function (because I remove some columns before doing the differences and then reinstate them in a highly repetitive fashion).
For a long time I was on pandas 0.18.x and was using the following command fine:
data = data.groupby('country).diff().shift(-1)
But after upgrading to pandas 0.20.1, the behavior of diff seems to have changed, and now takes a periods argument, which is very useful to me! Now, the problem is I get thrown a error everytime I use it. The traceback looks like this:
Traceback (most recent call last):
File "/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-5e1d634b8803>", line 1, in <module>
dat = feature_expand(data_everything, lags=2, lag_y=True, delta=True)
File "<ipython-input-3-184a59b406db>", line 126, in feature_expand
data_delta = data_delta.diff()
File "<string>", line 21, in diff
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 612, in wrapper
*args, **kwargs)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 3481, in _aggregate_item_by_item
raise errors
TypeError: diff() got an unexpected keyword argument 'axis'
Following the traceback I find a wrapper function in groupby.py under, _GroupBy._make_wrapper().wrapper, which says it does some “trickery for aggregation functions that need an axis”, and seems to add the axis keyword argument by itself. This has probably been useful behaviour previously, but now it breaks .diff() as it doesn’t take an axis argument anymore.
I hope someone has time to help me and the community with this.
Cheers 😃
Expected Output
A dataframe of country-level first differences.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: 2.9.2 pip: 9.0.1 setuptools: 35.0.2 Cython: 0.24.1 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.2.3.1 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.7.3 bs4: 4.5.3 html5lib: 0.9999999 sqlalchemy: 1.1.9 pymysql: 0.7.9.None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Suspect, it’s related to #14773
closing as duplicate