pd.Timestamp constructor ignores missing arguments
See original GitHub issueAs part of the discussions in #31563, I came across these strange semantics in pd.Timestamp
, where it is apparently legal to over-specify a pd.Timestamp
by specifying both a datetime
(or another Timestamp
) and pass the by-component construction values, and any irrelevant arguments are ignored:
>>> pd.Timestamp(datetime(2020, 12, 31),
year=1, month=1, day=1,
hour=23, minute=59, second=59, microsecond=999999)
Timestamp('2020-12-31 00:00:00')
The signature for the function is:
pd.Timestamp(
ts_input=<object object at 0x7fd988a10760>,
freq=None,
tz=None,
unit=None,
year=None,
month=None,
day=None,
hour=None,
minute=None,
second=None,
microsecond=None,
nanosecond=None,
tzinfo=None,
)
There’s actually a decent amount of redundant information in there, because pd.Timestamp
is attempting to have its own constructor *in addition to being constructable like a datetime. Properly, there are two overloaded constructors here (note that I’m not sure if nanosecond
belongs to both or just one):
pd.Timestamp(ts_input, freq, tz, unit[, nanosecond?])
pd.Timestamp(year, month, day
[, hour, minute, second, microsecond, nanosecond, tzinfo])
I think that ideally the correct behavior would be to throw an error if you mix and match between the two, which is at least done in the case of specifying both tz
and tzinfo
:
>>> pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-d80c9ce6a89d> in <module>
----> 1 pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)
pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
ValueError: Can provide at most one of tz, tzinfo
Though confusingly this also fails if you specify tzinfo
at all in the “by-component” constructor. I have filed a separate bug for that at #31929.
Recommendation
I think that the behavior of pandas.Timestamp
should probably be brought at least mostly in-line with the concept of two overloaded constructors (possibly with tz
and tzinfo
being mutually-exclusive aliases for one another). Any other combination, particularly combinations where the values passed are ignored, should raise an exception.
This may be a breaking change, since it will start raising exceptions in code that didn’t raise exceptions before (though I am not sure I can think of any situation where silently ignoring the values is a desirable condition), so it may be a good idea to have a deprecation period where a warning rather than an exception is raised.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:10 (8 by maintainers)
Top GitHub Comments
pls don’t massively refactor in this PR i think reviewing a refactor and add fold PR would never be merged
if we want to merge this with a little more review and then potentially do a refactor (likely in multiple steps) then ok with that
we have quite a large suite for Timestamp so not refacators should be easily possible
FWIW - with version 1.4.2 of pandas the following two options work nicely
and
ts1 == ts2
returnsTrue
. However, trying to construct the same timestamp using positional values:raises an error:
I’m totally not sure whether this issue is the relevant one, but I got here thanks to https://github.com/pandas-dev/pandas/issues/31930#issuecomment-597702139 by @ArtyomKaltovich.