question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Mysterious Series.get() with Int64Index bug

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

def get_t_minus_n_val(n):

    def f(x):
        # assume index is days_from_origin
        if n == 3 and x.index[-1] == 27:
            import pdb; pdb.set_trace()
            print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
        return x.get(x.index[-1] - n, np.NaN)

    f.__name__ = f"t_minus_{n}_days"

    return f

res = data_df.set_index("days_from_origin_").groupby("device").agg({"metric1": get_t_minus_n_val(3)})

> <ipython-input-452-9b903578cb56>(7)f()
-> print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
(Pdb) x.get(24)
(Pdb) x.iloc[-5:]
23     60221064
24    232131096
25     46413584
26    133181464
27    229400712
Name: metric1, dtype: int64
(Pdb) 24 in x.index
False
(Pdb) 24 in x.index.values
True
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27],
           dtype='int64')
(Pdb) x.to_dict().get(24)
232131096

Problem description

I am not able to get a minimum repro of this as it is seems to be data dependent. Instead, I am capturing the bug by showing you my pdb debugging statements in hopes that someone that knows the code can figure out where the problem is. Basically, I am doing a custom agg function which needs to grab an element from a Series object, and even though the value clearly exists in the index, it returns None. If I first convert to dict, then it does get the value. I was not able to repro this by simply creating a new series and calling .get on it… that works just fine. And in fact, if I filter the dataframe for just that device, then it works just fine. It is definitely some sort of internal state issue which happens as a result of groupby having more records…

Expected Output

obviously I expect x.get(24) to return the correct value instead of None.

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : None python : 3.7.6.final.0 python-bits : 64 OS : Darwin OS-release : 17.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 41.2.0 Cython : None pytest : 5.4.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.4.1 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.48.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, May 2, 2020

@sam-cohan thanks for the better reproducible example and the PR!

1reaction
simonjayhawkinscommented, May 1, 2020

So the good new is that this is fixed again on master

fixed in #32611 (i.e. 1.1)

fa48f5f098b5ed49a08b9d19f072559f74e1ccc2 is the first new commit commit fa48f5f098b5ed49a08b9d19f072559f74e1ccc2 Author: jbrockmendel jbrockmendel@gmail.com Date: Wed Mar 11 21:30:02 2020 -0700

REF: implement _get_engine_target (#32611)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Key Error: None of [Int64Index…] dtype='int64] are in the ...
First, I import the data. data = pd.read_csv('train_strokes.csv'). Then preprocessing dataframe # Preprocessing data data.drop( ...
Read more >
What's new in 1.2.0 (December 26, 2020) - Pandas
Bug in DataFrame.stack() where an empty DataFrame.stack would raise an error (GH36113). Now returning an empty Series with empty MultiIndex.
Read more >
Solved (Python Programming)-- Hi, I run the following code
Question: (Python Programming)-- Hi, I run the following code with reported error, "Series length are different" please help, thanks.
Read more >
TypeError: 'numpy.ndarray' object is not callable. Strange error ...
Why am I getting a list object is not callable error on python? Statsmodels (Patsy) illegal variable name / 'Series' object is not...
Read more >
Typerror when using Geopandas.to_file - GIS Stack Exchange
This data resides in my projects.hull column. I am running into a Typeerror which I am finding difficult to understand. It looks like...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found