BUG: Mysterious Series.get() with Int64Index bug
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
def get_t_minus_n_val(n):
def f(x):
# assume index is days_from_origin
if n == 3 and x.index[-1] == 27:
import pdb; pdb.set_trace()
print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
return x.get(x.index[-1] - n, np.NaN)
f.__name__ = f"t_minus_{n}_days"
return f
res = data_df.set_index("days_from_origin_").groupby("device").agg({"metric1": get_t_minus_n_val(3)})
> <ipython-input-452-9b903578cb56>(7)f()
-> print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
(Pdb) x.get(24)
(Pdb) x.iloc[-5:]
23 60221064
24 232131096
25 46413584
26 133181464
27 229400712
Name: metric1, dtype: int64
(Pdb) 24 in x.index
False
(Pdb) 24 in x.index.values
True
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27],
dtype='int64')
(Pdb) x.to_dict().get(24)
232131096
Problem description
I am not able to get a minimum repro of this as it is seems to be data dependent. Instead, I am capturing the bug by showing you my pdb debugging statements in hopes that someone that knows the code can figure out where the problem is. Basically, I am doing a custom agg function which needs to grab an element from a Series object, and even though the value clearly exists in the index, it returns None. If I first convert to dict, then it does get the value. I was not able to repro this by simply creating a new series and calling .get on it… that works just fine. And in fact, if I filter the dataframe for just that device, then it works just fine. It is definitely some sort of internal state issue which happens as a result of groupby having more records…
Expected Output
obviously I expect x.get(24) to return the correct value instead of None.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here leaving a blank line after the details tag]
INSTALLED VERSIONS
commit : None python : 3.7.6.final.0 python-bits : 64 OS : Darwin OS-release : 17.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 41.2.0 Cython : None pytest : 5.4.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.4.1 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.48.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
@sam-cohan thanks for the better reproducible example and the PR!
fixed in #32611 (i.e. 1.1)
fa48f5f098b5ed49a08b9d19f072559f74e1ccc2 is the first new commit commit fa48f5f098b5ed49a08b9d19f072559f74e1ccc2 Author: jbrockmendel jbrockmendel@gmail.com Date: Wed Mar 11 21:30:02 2020 -0700