Overview of [] (__getitem__) API
See original GitHub issuesome examples (on Series only) in #12890
I started making an overview of the indexing semantics with http://nbviewer.ipython.org/gist/jorisvandenbossche/7889b389a21b41bc1063 (only for series/frame, not for panel)
Conclusion: it is mess 😃
Summary for slicing
- Slicing with integer labels is:
- always integer location based
- except for a float indexer where it is label based
- Slicing with other types of labels is always label based if it is of appropriate type for the indexer.
So, you can say that the behaviour is equivalent to .ix
, except that the behaviour for integer labels is different for integer indexers (swapped). (For .ix
, when having an integer axis, it is always label based and no fallback to integer location based).
Summary for single label
- Indexing with a single label is always label based
- But, there is fallback to integer location based, except for integer and float indexers
Summary for indexing with list of labels
- It is primarily label based, but:
- There is fallback to integer location based apart from int/float integer axis
- It is a pure reindex, also if no label of the list is found, you just get an all NaN series (which contrasts with loc, where at least one label should be found)
- String parsing for a datetime index does not seem to work
This mainly follows ix
, apart from points 2 and 3
Summary for boolean indexing
- This is simple, it just works as expected
Summary for DataFrames
- It uses the ‘information’ axis (axis 1) for:
- single labels
- list of labels
- It uses the rows (axis 0) for:
- slicing
- boolean indexing
This is as documented (only the boolean case is not explicitely documented I think).
For the rest (on the choses axis), it follows the same semantics as []
on a series, but:
- for a list of labels, now all labels must be present (no pure reindex as with series)
- for single labels: no fallback to integer location based for non-numeric index (but this does fallback for a list of labels …)
Questions are here:
- Are there things we can change? (that would not be too disruptive … maybe not?) And want change?
- How do we document this best?
- Now you have the “basics” section (http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics) and the slicing section (http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges), but this does not cover all cases at all.
Issue Analytics
- State:
- Created 9 years ago
- Reactions:7
- Comments:20 (15 by maintainers)
Top Results From Across the Web
GetItem - Amazon DynamoDB - AWS Documentation
The GetItem operation returns a set of attributes for the item with the given primary key. If there is no matching item, GetItem...
Read more >GetItem - API Reference - eBay Developers Program
GetItem returns item data for only a single item per call. Make multiple calls to GetItem to review details for more than a...
Read more >Storage.getItem() - Web APIs - MDN Web Docs
The getItem() method of the Storage interface, when passed a key name, will return that key's value, or null if the key does...
Read more >GetItem - Amazon DynamoDB - 亚马逊云科技
DescriptionRequestsResponsesSpecial errorsExamples ... GetItem. Important. This section refers to API version 2011-12-05, which is deprecated and should not ...
Read more >DynamoDB GetItem vs Query API - YouTube
DynamoDB's GetItem and Query operations are two of the most commonly used APIs to retrieve data from your table. But when should you...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
xref #9213, CC @hugadams @dandavison
@jorisvandenbossche Indeed, this is a nice summary of current behavior. Thanks!
I think we should consider radical API changes for
__getitem__
if we want pandas to have a lasting influence.My two cents on indexing is that “fallback indexing” is a really bad idea. It starts with the best of intentions, but leads to things like special cases like distinctions between integer and float indexes (e.g., see #9213). In the face of ambiguity, refuse the temptation to guess.
So if I were reinventing indexing rules from scratch, I would consider something like this (for
DataFrame
):That’s it. Two simple rules that probably cover 90% of existing uses of
__getitem__
, at least the only ones that I could ever keep straight (string column labels and boolean arrays). Importantly, indexing would never depend on the type of the index and there would be no reindexing/NaN-filling behavior. We could also eliminate the need for.iloc
as a separate indexer entirely.This sort of change would require a serious deprecation cycle or perhaps need to wait until pandas 1.0 (likely both), but something needs to change. The fact that even pandas developers need to run extensive experiments to figure out how
__getitem__
works indicates just how wrong things are. Indexing should be simple enough that its behavior can be relied on in production code. The current state of indexing is, frankly, embarrassing.If I were to rebuild pandas, I would make indexing as simple as possible and only use
.loc
and.iloc
. I would not implement__getitem__
. There would be no ambiguity. I also wouldn’t allow attribute access to columns. It would be a pain to select a single columndf.loc[:, 'col']
but pandas really needs to focus on being explicit.