Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ExtensionArray.map

See original GitHub issue

Both Categorical and SparseArray found implementing a .map method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380

So, we need to either

Add it to the interface
hard-code checks for categorical or sparse dtype there.

Do people have a preference? Right now I’m leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?

Issue Analytics

State:
Created 5 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

jbrockmendelcommented, May 4, 2021

any UDF can be remain in the same dtype (which I think is true for categorical and sparse)

Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype) and that will usually work, but thats bc it will just set any non-fitting element to nan.

0reactions

rhshadrachcommented, May 4, 2021

Ran into this in #39941, where map is used for categorical and sparse in apply. Here, it results in different dtype behavior than other EAs. But it seems to me that map only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64 where the mapper is lambda x: 3.2 or lambda x: "a"?

Edit: I just found datetime64 also implements map which does not have the property I mentioned.

Top Results From Across the Web

pandas.api.extensions.ExtensionArray

pandas.api.extensions.ExtensionArray# ... Abstract base class for custom 1-D array types. pandas will recognize instances of this class as proper arrays with a ...

python - Simple example of Pandas ExtensionArray

To have a concrete example, let's say I want to extend ExtensionArray to obtain an integer array that is able to hold NA...

pyarrow.ExtensionArray — Apache Arrow v10.0.1

A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of...

Extension to Array<Element>?… | Apple Developer Forums

You know, not all Swift Arrays can be converted to NSArray. You may need to write something like this: extension Array where Element:...

Pandas Series property: array

An ExtensionArray in Pandas. The ExtensionArray of the data backing Pandas Series or Index. Syntax: Series.array