question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: specifying fill_value in pandas.DataFrame.shift() messes with index of empty dataframes

See original GitHub issue
  • [ x] I have checked that this issue has not already been reported.

  • [ x] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

# create empty df, and set a multi index
empty_df = pd.DataFrame(columns=["a", "b", "c"])
multi_index_empty_df = empty_df.set_index(keys=["a", "b"])

# index is different when applying shift on grouping when specifying a fill value
assert multi_index_empty_df.groupby(["a","b"]).shift(1).index.names == multi_index_empty_df.groupby(["a","b"]).shift(1, fill_value=0).index.names

Problem description

I discovered this problem when a unit test failed on a function that performed a groupby and shift on an empty dataframe.

Specifying fill_value in pandas.DataFrame.shift() should not alter the index that was set on a dataframe. This is not the case for empty dataframes with a multi-index, as the example above shows.

Expected Output

When executing:

multi_index_empty_df.groupby(["a","b"]).shift(1).index.names

I get the output:

FrozenList(['a', 'b'])

But when executing

multi_index_empty_df.groupby(["a","b"]).shift(1, fill_value=0).index.names

I should get the same output, but instead I get

FrozenList([None])

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.7.1.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.5 numpy : 1.19.0 pytz : 2018.7 dateutil : 2.7.5 pip : 18.1 setuptools : 40.6.3 Cython : 0.29.2 pytest : 4.0.2 hypothesis : None sphinx : 1.8.2 blosc : None feather : None xlsxwriter : 1.1.2 lxml.etree : 4.2.5 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.19.0 pandas_datareader: None bs4 : 4.6.3 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.2.5 matplotlib : 3.2.2 numexpr : 2.6.8 odfpy : None openpyxl : 2.5.12 pandas_gbq : None pyarrow : None pytables : None pytest : 4.0.2 pyxlsb : None s3fs : None scipy : 1.5.1 sqlalchemy : 1.2.15 tables : 3.4.4 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.2 numba : 0.41.0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
benbogartcommented, May 12, 2021

I did not create a unittest. I just tested in a notebook and since the problem was nonexistent didn’t go further. I will create a unittes, submit it and link it here.

1reaction
mzeitlin11commented, May 2, 2021

Thanks for the report @brunocous. This works on master (and latest pandas version (1.2.4)), but could probably use a test.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.shift() fill_value not working - Stack Overflow
shift() function with the fill_value keyword argument, I keep getting a TypeError: shift() got an unexpected keyword argument 'fill_value' .
Read more >
What's New — pandas 0.19.2 documentation
Passing duplicated percentiles will now raise a ValueError . Bug in .describe() on a DataFrame with a mixed-dtype column index, which would previously...
Read more >
What's New — pandas 0.16.2 documentation
Bug where labels did not appear properly in the legend of DataFrame.plot(), passing label= arguments works, and Series indices are no longer mutated....
Read more >
What's New — pandas 0.18.1 documentation
This is a minor bug-fix release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and...
Read more >
Version 0.15.0 (October 18, 2014) - Pandas - PyData |
Bug Fixes. Warning. In 0.15.0 Index has internally been refactored to no longer sub-class ndarray but instead subclass PandasObject , similarly to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found