ExtensionArray.map
See original GitHub issueBoth Categorical and SparseArray found implementing a .map
method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380
So, we need to either
- Add it to the interface
- hard-code checks for categorical or sparse dtype there.
Do people have a preference? Right now I’m leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
pandas.api.extensions.ExtensionArray
pandas.api.extensions.ExtensionArray# ... Abstract base class for custom 1-D array types. pandas will recognize instances of this class as proper arrays with a ...
Read more >python - Simple example of Pandas ExtensionArray
To have a concrete example, let's say I want to extend ExtensionArray to obtain an integer array that is able to hold NA...
Read more >pyarrow.ExtensionArray — Apache Arrow v10.0.1
A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of...
Read more >Extension to Array<Element>?… | Apple Developer Forums
You know, not all Swift Arrays can be converted to NSArray. You may need to write something like this: extension Array where Element:...
Read more >Pandas Series property: array
An ExtensionArray in Pandas. The ExtensionArray of the data backing Pandas Series or Index. Syntax: Series.array
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to
type(self)._from_sequence(result, dtype=self.dtype)
and that will usually work, but thats bc it will just set any non-fitting element to nan.Ran into this in #39941, where
map
is used for categorical and sparse inapply
. Here, it results in different dtype behavior than other EAs. But it seems to me thatmap
only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g.Int64
where the mapper islambda x: 3.2
orlambda x: "a"
?Edit: I just found datetime64 also implements map which does not have the property I mentioned.