Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Q: DataFrame `loc` and `iloc` seem to have inconsistent negative indexing behaviors.

See original GitHub issue

With version: 0.20.3, DataFrame loc and iloc have inconsistent and buggy indexing behaviors.

df = pd.DataFrame([dict(idx=idx) for idx in range(10)])
print(df.loc[range(3) + range(-3, 0), 'idx'])

returns NaN for negative indices

 0    0.0
 1    1.0
 2    2.0
-3    NaN
-2    NaN
-1    NaN

(also note that somehow the int became float…)

whereas

print(df.iloc[range(3) + range(-3, 0)])

returns the last raws

Additionally, loc fails if only negative indices are passed: df.loc[[-2, -1], 'idx'] but not if both positive and negative df.loc[[0, -1], 'idx']

Issue Analytics

State:
Created 6 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

3reactions

jorisvandenbosschecommented, Aug 18, 2017

@kingjr Have a look at the indexing docs (http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) on the different options to index (see also that link more below under “Selection by label” and “Selection by position”)

The behaviours of loc and iloc are different on purpose because they serve different goals:

loc is label based: the negative values are not present in the index labels, and hence you get missing values for that (loc returns a result once there is at least one existing label present, in the case of df.loc[[-2, -1], 'idx'] no existing label is present and therefore it raises)
iloc is position based: negative indices here mean ‘start to count from the end’, and therefore the shown result is perfectly as expected

(also note that somehow the int became float…)

This is currently a limitation of pandas that missing values can only be represented for floats, see http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

2reactions

jrebackcommented, Aug 18, 2017

pls read the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label

.loc is for label based indexing. The neg indices are not found and reindexed to NaN. An integer index by-definition is label based indexed with .loc.

.iloc is always positional indexed.

Top Results From Across the Web

Inconsistent behavior when inserting a set into cells using .loc ...

In second assignment, you update a cell in an existing column. Pandas has no reason to unpack anything here, and it affects the...

How to use loc and iloc for selecting data in Pandas | by B. Chen

loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer...

Source code for darts.timeseries

TimeSeries.from_times_and_values : Create from a time index and a Numpy ... get values if value_cols is None: series_df = df.loc[:, df.columns != time_col] ......

pandas2.py - Hackage

Information column is Categorical-type and takes on a value of "left_only" for observations whose merge key only appears in 'left' DataFrame, ...

Indexing and Selecting Data — pandas 0.12.0 documentation

Float indexes should be used only with caution. If you have a float indexed DataFrame and try to select using an integer, the...