question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError

See original GitHub issue

Code Sample, a copy-pastable example if possible

For:


result = a2b.loc[vals]       # pd.Series()[ np.array ]

If a2b is a series that maps {int64:int64} and vals is an int64 array, the result should be a series that maps {int64:int64}, or a KeyError should be thrown

Pasteable repo:

       import pandas as pd
       import numpy as np
       a2b = pd.Series(
           index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                     9725301000001109,  9725601000001103,  9725801000001104,
                     9730701000001104, 10049011000001109, 10328511000001105]),
           data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104])
       )
       assert a2b.dtype==np.int64
       assert a2b.index.dtype==np.int64
       key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                             9725301000001109,  9725601000001103,  9725801000001104,
                             9730701000001104,
                             10047311000001102, # Misin in a2b.index
                             10049011000001109,
                             10328511000001105])
       result = a2b.loc[key]
       result
       assert result.dtype==np.int64
       assert result.index.dtype==np.int64

What happens:

In [2]:         import pandas as pd
   ...:         import numpy as np
   ...:         a2b = pd.Series(
   ...:             index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                       9725301000001109,  9725601000001103,  9725801000001104,
   ...:                       9730701000001104, 10049011000001109, 10328511000001105]),
   ...:             data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104])
   ...:         )
   ...:         assert a2b.dtype==np.int64
   ...:         assert a2b.index.dtype==np.int64
   ...:         key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                               9725301000001109,  9725601000001103,  9725801000001104,
   ...:                               9730701000001104,
   ...:                               10047311000001102, # Misin in a2b.index
   ...:                               10049011000001109,
   ...:                               10328511000001105])
   ...:         result = a2b.loc[key]
   ...:         result
   ...: 
Out[2]: 
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
NaN             NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
dtype: float64

In [3]:         assert result.dtype==np.int64
   ...:         assert result.index.dtype==np.int64
   ...: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-be86ec17a393> in <module>()
----> 1 assert result.dtype==np.int64
      2 assert result.index.dtype==np.int64

AssertionError: 

Problem description

I don’t like this behavior because:

  • I have quietly lost all my data due to cast to float64
  • in other calls to getitem a KeyError is raised if a value is not found in the index.

Expected Output

Asserts should not fail.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] In [4]: pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.15.candidate.1 python-bits: 64 OS: Linux OS-release: 4.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.22.0 pytest: None pip: 18.1 setuptools: 40.6.2 Cython: 0.29.1 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 5.0.0 sphinx: None patsy: 0.5.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: 2.6.8 feather: None matplotlib: 2.1.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.17 pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
cbertinatocommented, Apr 3, 2019

I see the KeyError with __getitem__, but the message given when trying a2b.loc[key] indicates that, while no error is thrown now, it will be in the future. It seems to me that, while the current behavior is not ideal, it is expected.

But I’m still trying to sleuth out whether there’s another issue here. @jreback do you mean that because one of the labels in key is not in a2b, then that label shouldn’t show up in the index of a2b.loc[key]?

1reaction
jrebackcommented, Mar 30, 2019

use .iloc as that is what is designed for selecting by position as the docs indicate: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-position

getitem it falling back here as described http://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#miscellaneous-indexing-faq

this is as expected behavior and is not likely to change

Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError Pandas – How To Fix - Data Independent
Pandas KeyError - This annoying error means that Pandas can not find your column name in your dataframe. Here's how to fix this...
Read more >
Pandas indexing and Key error - python - Stack Overflow
Pandas behaviour when asking for df.B[0] is ambiguous and depends on data type of index and data type of value passed to python...
Read more >
Indexing and selecting data - Pandas - PyData |
pandas will raise a KeyError if indexing with a list with missing labels. See list-like Using loc with missing keys in a list...
Read more >
Source code for pandas.core.frame - OMFIT
The dot method for Series computes the inner product, instead of the ... raise a KeyError, same as if you try to __getitem__...
Read more >
pandas2.py - Hackage
Can be thought of as a dict-like container for Series objects. ... raise a KeyError, same as if you try to __getitem__ with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found