question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: astype mechanism for extension arrays

See original GitHub issue

In https://github.com/pandas-dev/pandas/pull/22343, we’re bumping into a few problems around casting to / from extension arrays.

  1. Our current default implementation for ExtensionArray.astype doesn’t handle target dtypes that are extension types (so extension_array.astype('category') fails). At the moment, each EA will have to implement their own astyping, which is error prone and difficult.
  2. Some EAs may be more opinionated than others about how astyping should be done. There may exist fastpaths between certain EAs, but only one of the source and destination types may know about that fast path. This (I think) calls for a kind of dispatch mechanism, where each gets to say whether they know how to handle the astyping.

I’m not sure how far down this road we want to go. Should we instead encorage users to use explicit constructors like .assign(x=pd.Categorical(...) rather than .astype('category')?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:41 (38 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Sep 16, 2021

I agree that in general (at least the safe default casting) should be about the “same information in different format”, but I think that’s not a hard line, and you could certainly argue that integers hold the same information (given you known the resolution, tz, etc), although that’s certainly debatable 😃

But, on the specific datetime -> integer deprecation:

  • I find it a bit strange to deprecate/disallow it for astype, but then point people to the view instead. There are usecases where you need the integers (eg if you want to do some custom rounding, or need to feed it to a system that requires unix time as integers, …), and personally I would rather have users go to astype than view (because astype is the more standard method for this, + if we would go with copy-on-write, this gets a bit a strange method …)
    In addition, using view will actually error for non-equal size bitwidth (astype actually as well, but that’s something we can change, while for view that is inherent to the method). And view can also silently overflow if converting to uint64, while for astype we could check for that. In general, I see view as an advanced method you should only use if you really know what you are doing (and in general you don’t really need in pandas, I think)
  • There is no ambiguity around what the expected result would be IMO (for naive datetimes / timedelta)
  • The other way around (integer -> datetime / timedelta) is not deprecated
0reactions
jbrockmendelcommented, Feb 3, 2022

So if Sally implements FooDtype._cast_to(), then BarArray.astype(any_dtype) can just call any_dtype._cast_to() without having to even know that FooDtype exists. Right?

That’s my understanding of _cast_to, yes. My point was that this behavior already exists with the existing pattern with _from_sequence taking the place of _cast_to.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.api.extensions.ExtensionArray.astype
Cast to a NumPy array or ExtensionArray with 'dtype'. Typecode or data-type to which the array is cast. Whether to copy the data,...
Read more >
Data Representations - Block API - the Ray documentation
Here's an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take() . def ...
Read more >
NEP 18 — A dispatch mechanism for NumPy's high level array ...
Similarly there are many projects that build on top of the NumPy API for labeled and indexed arrays (XArray), automatic differentiation ( ...
Read more >
NumPy API on TensorFlow
ND arrays can refer to buffers placed on devices other than the local CPU memory. In such cases, invoking a NumPy function will...
Read more >
NDArray API — mxnet documentation
NDArray.astype, Returns a copy of the array after casting to a specified type. ... gamma, Returns the gamma function (extension of the factorial...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found