BUG: pd.array([timedelta_like_strings]) should infer TimedeltaArray
See original GitHub issue>>> left = pd.array(['59 days', '59 days', pd.NaT]))
>>> left
<StringArray>
['59 days', '59 days', <NA>]
Length: 3, dtype: string
>>> left = pd.array(['59 days', '59 days', pd.NaT], dtype='m8[ns]')
>>> left
<TimedeltaArray>
['59 days', '59 days', NaT]
Length: 3, dtype: timedelta64[ns]
Issue Analytics
- State:
- Created 3 years ago
- Comments:19 (19 by maintainers)
Top Results From Across the Web
pandas.array — pandas 1.5.2 documentation
Currently, pandas will infer an extension dtype for sequences of ... pandas will always return a DatetimeArray or TimedeltaArray rather than a PandasArray...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
yah, i closed bc i thought there was consensus that the title of the issue was wrong; i.e. you guys convinced me.
Sorry for the confusion here 😉 (and ignore the title of the issue for a moment, that’s indeed in need of an update if we agree what this issue is about)
The only reason that I reopened this is because (I thought) our earlier discussion (@TomAugspurger see your comment above at https://github.com/pandas-dev/pandas/issues/33558#issuecomment-614096100 which says “Are we agreed that the expected dtype for these mixed cases (strings and NaT) is object?”) concluded that the current behaviour (inferring this mixed case to string) is not wanted (we want to infer mixed case to object instead). So since the example code that started this issue is not doing on master what we want it to do, it seems worth it to have an open issue about this (even though the original proposal of @jbrockmendel when opening this issue is different).
Well, I was not actaully thinking about that, but now you say it … 😉 That’s maybe indeed what we should do.
There are basically two “different” behaviours to consider (and when reopening this issue I was actually only thinking about the first):
pd.array(['59 days', pd.NaT])
-> infers string dtypepd.Series(['59 days', pd.NaT])
-> infers timedelta64[ns] dtypeIdeally, we want to have both of those consistent, agreed? And it seems there is also agreement on inferring as object dtype? (but noting again that raising an error is also still an option, users can specify
dtype=object
if they really meant to create an object-dtype series)For the first one, I think we can still simply change, because
pd.array()
's behaviour is still in flux anyway (string dtype is experimental). The Series behaviour, if we agree on changing this default inference, indeed is something we need to deprecate. The question might be if we already want to deprecate this now, or have this integrated in a general move topd.array
behaviour / nullable dtypes we need to do at some point (there are other differences in behaviour between pd.Series() inference and pd.array() inference as well)