Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Overview of [] (getitem) API

See original GitHub issue

some examples (on Series only) in #12890

I started making an overview of the indexing semantics with http://nbviewer.ipython.org/gist/jorisvandenbossche/7889b389a21b41bc1063 (only for series/frame, not for panel)

Conclusion: it is mess 😃

Summary for slicing

Slicing with integer labels is:
- always integer location based
- except for a float indexer where it is label based
Slicing with other types of labels is always label based if it is of appropriate type for the indexer.

So, you can say that the behaviour is equivalent to .ix, except that the behaviour for integer labels is different for integer indexers (swapped). (For .ix, when having an integer axis, it is always label based and no fallback to integer location based).

Summary for single label

Indexing with a single label is always label based
But, there is fallback to integer location based, except for integer and float indexers

Summary for indexing with list of labels

It is primarily label based, but:
- There is fallback to integer location based apart from int/float integer axis
- It is a pure reindex, also if no label of the list is found, you just get an all NaN series (which contrasts with loc, where at least one label should be found)
- String parsing for a datetime index does not seem to work

This mainly follows ix, apart from points 2 and 3

Summary for boolean indexing

This is simple, it just works as expected

Summary for DataFrames

It uses the ‘information’ axis (axis 1) for:
- single labels
- list of labels
It uses the rows (axis 0) for:
- slicing
- boolean indexing

This is as documented (only the boolean case is not explicitely documented I think).

For the rest (on the choses axis), it follows the same semantics as [] on a series, but:

for a list of labels, now all labels must be present (no pure reindex as with series)
for single labels: no fallback to integer location based for non-numeric index (but this does fallback for a list of labels …)

Questions are here:

Are there things we can change? (that would not be too disruptive … maybe not?) And want change?
How do we document this best?
- Now you have the “basics” section (http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics) and the slicing section (http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges), but this does not cover all cases at all.

Issue Analytics

State:
Created 9 years ago
Reactions:7
Comments:20 (15 by maintainers)

Top GitHub Comments

10reactions

shoyercommented, Mar 6, 2015

xref #9213, CC @hugadams @dandavison

@jorisvandenbossche Indeed, this is a nice summary of current behavior. Thanks!

I think we should consider radical API changes for __getitem__ if we want pandas to have a lasting influence.

My two cents on indexing is that “fallback indexing” is a really bad idea. It starts with the best of intentions, but leads to things like special cases like distinctions between integer and float indexes (e.g., see #9213). In the face of ambiguity, refuse the temptation to guess.

So if I were reinventing indexing rules from scratch, I would consider something like this (for DataFrame):

Indexing with a string or list of strings does label based selection on columns.
All other indexing is position based, NumPy style. (This includes indexing with a boolean array.)

That’s it. Two simple rules that probably cover 90% of existing uses of __getitem__, at least the only ones that I could ever keep straight (string column labels and boolean arrays). Importantly, indexing would never depend on the type of the index and there would be no reindexing/NaN-filling behavior. We could also eliminate the need for .iloc as a separate indexer entirely.

This sort of change would require a serious deprecation cycle or perhaps need to wait until pandas 1.0 (likely both), but something needs to change. The fact that even pandas developers need to run extensive experiments to figure out how __getitem__ works indicates just how wrong things are. Indexing should be simple enough that its behavior can be relied on in production code. The current state of indexing is, frankly, embarrassing.

5reactions

tdpetroucommented, Nov 27, 2017

If I were to rebuild pandas, I would make indexing as simple as possible and only use .loc and .iloc. I would not implement __getitem__. There would be no ambiguity. I also wouldn’t allow attribute access to columns. It would be a pain to select a single column df.loc[:, 'col'] but pandas really needs to focus on being explicit.

Top Results From Across the Web

GetItem - Amazon DynamoDB - AWS Documentation

The GetItem operation returns a set of attributes for the item with the given primary key. If there is no matching item, GetItem...

GetItem - API Reference - eBay Developers Program

GetItem returns item data for only a single item per call. Make multiple calls to GetItem to review details for more than a...

Storage.getItem() - Web APIs - MDN Web Docs

The getItem() method of the Storage interface, when passed a key name, will return that key's value, or null if the key does...

GetItem - Amazon DynamoDB - 亚马逊云科技

DescriptionRequestsResponsesSpecial errorsExamples ... GetItem. Important. This section refers to API version 2011-12-05, which is deprecated and should not ...

DynamoDB GetItem vs Query API - YouTube

DynamoDB's GetItem and Query operations are two of the most commonly used APIs to retrieve data from your table. But when should you...