question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Overview of [] (__getitem__) API

See original GitHub issue

some examples (on Series only) in #12890

I started making an overview of the indexing semantics with http://nbviewer.ipython.org/gist/jorisvandenbossche/7889b389a21b41bc1063 (only for series/frame, not for panel)

Conclusion: it is mess 😃


Summary for slicing

  • Slicing with integer labels is:
    • always integer location based
    • except for a float indexer where it is label based
  • Slicing with other types of labels is always label based if it is of appropriate type for the indexer.

So, you can say that the behaviour is equivalent to .ix, except that the behaviour for integer labels is different for integer indexers (swapped). (For .ix, when having an integer axis, it is always label based and no fallback to integer location based).

Summary for single label

  • Indexing with a single label is always label based
  • But, there is fallback to integer location based, except for integer and float indexers

Summary for indexing with list of labels

  • It is primarily label based, but:
    • There is fallback to integer location based apart from int/float integer axis
    • It is a pure reindex, also if no label of the list is found, you just get an all NaN series (which contrasts with loc, where at least one label should be found)
    • String parsing for a datetime index does not seem to work

This mainly follows ix, apart from points 2 and 3

Summary for boolean indexing

  • This is simple, it just works as expected

Summary for DataFrames

  • It uses the ‘information’ axis (axis 1) for:
    • single labels
    • list of labels
  • It uses the rows (axis 0) for:
    • slicing
    • boolean indexing

This is as documented (only the boolean case is not explicitely documented I think).

For the rest (on the choses axis), it follows the same semantics as [] on a series, but:

  • for a list of labels, now all labels must be present (no pure reindex as with series)
  • for single labels: no fallback to integer location based for non-numeric index (but this does fallback for a list of labels …)

Questions are here:

Issue Analytics

  • State:open
  • Created 9 years ago
  • Reactions:7
  • Comments:20 (15 by maintainers)

github_iconTop GitHub Comments

10reactions
shoyercommented, Mar 6, 2015

xref #9213, CC @hugadams @dandavison

@jorisvandenbossche Indeed, this is a nice summary of current behavior. Thanks!

I think we should consider radical API changes for __getitem__ if we want pandas to have a lasting influence.

My two cents on indexing is that “fallback indexing” is a really bad idea. It starts with the best of intentions, but leads to things like special cases like distinctions between integer and float indexes (e.g., see #9213). In the face of ambiguity, refuse the temptation to guess.

So if I were reinventing indexing rules from scratch, I would consider something like this (for DataFrame):

  • Indexing with a string or list of strings does label based selection on columns.
  • All other indexing is position based, NumPy style. (This includes indexing with a boolean array.)

That’s it. Two simple rules that probably cover 90% of existing uses of __getitem__, at least the only ones that I could ever keep straight (string column labels and boolean arrays). Importantly, indexing would never depend on the type of the index and there would be no reindexing/NaN-filling behavior. We could also eliminate the need for .iloc as a separate indexer entirely.

This sort of change would require a serious deprecation cycle or perhaps need to wait until pandas 1.0 (likely both), but something needs to change. The fact that even pandas developers need to run extensive experiments to figure out how __getitem__ works indicates just how wrong things are. Indexing should be simple enough that its behavior can be relied on in production code. The current state of indexing is, frankly, embarrassing.

5reactions
tdpetroucommented, Nov 27, 2017

If I were to rebuild pandas, I would make indexing as simple as possible and only use .loc and .iloc. I would not implement __getitem__. There would be no ambiguity. I also wouldn’t allow attribute access to columns. It would be a pain to select a single column df.loc[:, 'col'] but pandas really needs to focus on being explicit.

Read more comments on GitHub >

github_iconTop Results From Across the Web

GetItem - Amazon DynamoDB - AWS Documentation
The GetItem operation returns a set of attributes for the item with the given primary key. If there is no matching item, GetItem...
Read more >
GetItem - API Reference - eBay Developers Program
GetItem returns item data for only a single item per call. Make multiple calls to GetItem to review details for more than a...
Read more >
Storage.getItem() - Web APIs - MDN Web Docs
The getItem() method of the Storage interface, when passed a key name, will return that key's value, or null if the key does...
Read more >
GetItem - Amazon DynamoDB - 亚马逊云科技
DescriptionRequestsResponsesSpecial errorsExamples ... GetItem. Important. This section refers to API version 2011-12-05, which is deprecated and should not ...
Read more >
DynamoDB GetItem vs Query API - YouTube
DynamoDB's GetItem and Query operations are two of the most commonly used APIs to retrieve data from your table. But when should you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found