question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.Timestamp constructor ignores missing arguments

See original GitHub issue

As part of the discussions in #31563, I came across these strange semantics in pd.Timestamp, where it is apparently legal to over-specify a pd.Timestamp by specifying both a datetime (or another Timestamp) and pass the by-component construction values, and any irrelevant arguments are ignored:


>>> pd.Timestamp(datetime(2020, 12, 31),
                 year=1, month=1, day=1,
                 hour=23, minute=59, second=59, microsecond=999999)
Timestamp('2020-12-31 00:00:00')

The signature for the function is:

pd.Timestamp(
    ts_input=<object object at 0x7fd988a10760>,
    freq=None,
    tz=None,
    unit=None,
    year=None,
    month=None,
    day=None,
    hour=None,
    minute=None,
    second=None,
    microsecond=None,
    nanosecond=None,
    tzinfo=None,
)

There’s actually a decent amount of redundant information in there, because pd.Timestamp is attempting to have its own constructor *in addition to being constructable like a datetime. Properly, there are two overloaded constructors here (note that I’m not sure if nanosecond belongs to both or just one):

pd.Timestamp(ts_input, freq, tz, unit[, nanosecond?])
pd.Timestamp(year, month, day
             [, hour, minute, second, microsecond, nanosecond, tzinfo])

I think that ideally the correct behavior would be to throw an error if you mix and match between the two, which is at least done in the case of specifying both tz and tzinfo:

>>> pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-d80c9ce6a89d> in <module>
----> 1 pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

ValueError: Can provide at most one of tz, tzinfo

Though confusingly this also fails if you specify tzinfo at all in the “by-component” constructor. I have filed a separate bug for that at #31929.

Recommendation

I think that the behavior of pandas.Timestamp should probably be brought at least mostly in-line with the concept of two overloaded constructors (possibly with tz and tzinfo being mutually-exclusive aliases for one another). Any other combination, particularly combinations where the values passed are ignored, should raise an exception.

This may be a breaking change, since it will start raising exceptions in code that didn’t raise exceptions before (though I am not sure I can think of any situation where silently ignoring the values is a desirable condition), so it may be a good idea to have a deprecation period where a warning rather than an exception is raised.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Feb 13, 2020

pls don’t massively refactor in this PR i think reviewing a refactor and add fold PR would never be merged

if we want to merge this with a little more review and then potentially do a refactor (likely in multiple steps) then ok with that

we have quite a large suite for Timestamp so not refacators should be easily possible

0reactions
droratacommented, Apr 11, 2022

FWIW - with version 1.4.2 of pandas the following two options work nicely

ts1 = pd.Timestamp("2009-01-10", tz="UTC")
ts2 = pd.Timestamp(year=2009, month=1, day=10, tz='UTC')

and ts1 == ts2 returns True. However, trying to construct the same timestamp using positional values:

pd.Timestamp(2009,1, 10, tz='UTC')

raises an error:

TypeError                                 Traceback (most recent call last)
/var/folders/hc/m_f707sd7t3ghfvxn5013sfm0000gp/T/ipykernel_39707/770468094.py in <module>
----> 1 pd.Timestamp(2009,1, 10, tz='UTC')

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

TypeError: __new__() got multiple values for keyword argument 'tz'

I’m totally not sure whether this issue is the relevant one, but I got here thanks to https://github.com/pandas-dev/pandas/issues/31930#issuecomment-597702139 by @ArtyomKaltovich.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use Pandas Timestamp fold argument? - Stack Overflow
It appears that the timezone info is not correctly specified: # using your code x = pd.Timestamp(datetime(2022,10,30,2,30,0), fold = 0, ...
Read more >
Time series / date functionality — pandas 1.5.2 documentation
Constructing a Timestamp or DatetimeIndex with an epoch timestamp with the tz argument specified will raise a ValueError. If you have epochs in...
Read more >
Spark SQL, DataFrames and Datasets Guide
The case class defines the schema of the table. The names of the arguments to the case class are read using reflection and...
Read more >
Snakefiles and Rules — Snakemake 7.19.1 documentation
Ignoring timestamps ¶. For determining whether output files have to be re-created, Snakemake checks whether the file modification date (i.e. the timestamp) of ......
Read more >
API — Zipline 2.2.0 documentation
The semantics for missing data are identical to the ones described in the notes for current() . Parameters. assets (zipline.assets.Asset or iterable of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found