question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ExtensionArray.map

See original GitHub issue

Both Categorical and SparseArray found implementing a .map method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380

So, we need to either

  1. Add it to the interface
  2. hard-code checks for categorical or sparse dtype there.

Do people have a preference? Right now I’m leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jbrockmendelcommented, May 4, 2021

any UDF can be remain in the same dtype (which I think is true for categorical and sparse)

Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype) and that will usually work, but thats bc it will just set any non-fitting element to nan.

0reactions
rhshadrachcommented, May 4, 2021

Ran into this in #39941, where map is used for categorical and sparse in apply. Here, it results in different dtype behavior than other EAs. But it seems to me that map only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64 where the mapper is lambda x: 3.2 or lambda x: "a"?

Edit: I just found datetime64 also implements map which does not have the property I mentioned.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.api.extensions.ExtensionArray
pandas.api.extensions.ExtensionArray# ... Abstract base class for custom 1-D array types. pandas will recognize instances of this class as proper arrays with a ...
Read more >
python - Simple example of Pandas ExtensionArray
To have a concrete example, let's say I want to extend ExtensionArray to obtain an integer array that is able to hold NA...
Read more >
pyarrow.ExtensionArray — Apache Arrow v10.0.1
A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of...
Read more >
Extension to Array<Element>?… | Apple Developer Forums
You know, not all Swift Arrays can be converted to NSArray. You may need to write something like this: extension Array where Element:...
Read more >
Pandas Series property: array
An ExtensionArray in Pandas. The ExtensionArray of the data backing Pandas Series or Index. Syntax: Series.array
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found