question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API/DES: Non-Nanosecond Tracker

See original GitHub issue

Support for non-nanosecond timedelta64, datetime64, and datetime64tz is coming along. The next big planned steps are to get the Timedelta and Timestamp scalars to support non-nano resolutions. There are a few design options that I’d like to get input on cc @pandas-dev/pandas-core

I’m doing Timedelta first mostly because that is more conducive to doing as a scoped PR with dedicated testing. Current plan is to make a dedicated constructor like Timedelta._from_value_and_reso (so i can write tests) that can can be removed once we decide on the public behavior. Which brings us to the questions:

  • Do we add a reso-like keyword to the constructors? Or use/respect “unit”?
  • With non-nano np.timedelta64 objects that don’t overflow when cast to nano, do we still cast? e.g. does `np.datetime64(4, “s”) become ns or stay s?
  • Similar with pytimedelta. pd.Timedelta(timedelta(days=106752)) currently raises, in the future will presumably come back with a ‘us’ reso. So what about currently non-raising cases like pd.Timedelta(timedelta(days=106751))? Does it stay ns or become us?

Other

  1. What happens to (class attributes) (Timestamp|Timedelta)(min|max|resolution)?

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
datapythonistacommented, May 19, 2022

If I’m not missing anything, the only user code that could “break” with what you propose is if the user has code creating second precision datetime data, which currently breaks, and after your changes would work.

Personally I think:

  1. Is very unlikely that users have code with second precision, since it’s not implemented. Or precision obtained at runtime
  2. If someone implemented this code, I assume would be to use it if this was ever implemented, and would be happy to see it finally working

If I’m not missing something, I’d personally add this directly without overcomplicating things.

0reactions
jbrockmendelcommented, Jun 20, 2022

Status Update 2022-06-20. The constructors point is the one the most pressing.

TODO - Technical:
    - Timestamp/Timedelta constructors will choke on values outside the pydatetime/pytimedelta implementation bounds; work around this.
    - PERF
        The implementation of non-nano support means many new checks on reso, which necessarily hurts performance. Try to get some of that back. Ideas:
        - Cache Localizer
        - Localizer.utc_val_to_local_val could be nogil and avoid the `except? -1` if restricted to the non-tzlocal/zoneinfo cases.  Re-separating these might improve performance at the cost of re-duplicating some code.
    - Not Yet Implemented
        - pd.io compat
        - pd.plotting compat

TODO - API Design:
    - Constructors
        Timestamp, Timedelta, to_datetime, to_timedelta, date_range, timedelta_range, others?

        - How to specify desired resolution? Options:
            a) add a keyword e.g. "reso", e.g.

                In [4]: pd.Timestamp(4567, unit="us", reso="us")

            b) repurpose the "unit" keyword to specify the output resolution, e.g.

                In [4]: pd.Timestamp(4567, unit="us")

            The main place where this would cause an issue is if the user wants a different input resolution vs output resolution, in which case we would direct the user to convert post-construction along the lines of:

                In [5]: pd.Timestamp(4567, unit="s").as_unit("us")

            (ATM there is a private _as_unit which will likely be made public)

            c) don't specify it, infer it from the input, let users convert post-construction if they want.

            If we were starting fresh, b) would be my preferred API.  Backwards-compat concerns make a) more appealing. If we want to provide something user-facing in 1.5, a) seems like the only option.

        - Resolution Inference - scalars
            - np.datetime64, np.timedelta64 have natural resolutions attached to them. We should preserve these wherever possible and cast to the nearest support resolution otherwise.  e.g. Timestamp(np.datetime64(4, "ms")) would retain millisecond resolution.
            - pydatetime and pytimedelta objects naturally have microsecond resolution. The Timestamp/Timedelta constructors should preserve these.
            - strings are the harder case.  Should e.g. `Timestamp("2022-06-20 08:21:04")` come back with second-resolution?  I lean towards "yes".

        - Resolution Inference - arrays
            If we do inference on scalars, presumably we would want the analogous inference on arrays of scalars. This raises questions about what to do with mixed-resolution inputs:

                In [3]: pd.to_datetime(["2016-01-02 03:04:05", "2016-06-07 08:09:10.111", "2017-11-20 12:13:14:156789"])


    - Arithmetic between Timestamps/Timedelta of mismatched resolution?
        - Options:
            a) Raise in all cases
                We currently do this for division (truediv, rtruediv, floordiv, rfloordiv) on Timedelta
            b) Cast to the higher resolution like numpy:

                In [2]: td = np.timedelta64(1, "h")
                In [3]: dt = np.datetime64("1994-01-02 03:04:05")
                In [4]: dt + td
                Out[4]: numpy.datetime64('1994-01-02T04:04:05')

                In [5]: td + dt
                Out[5]: numpy.datetime64('1994-01-02T04:04:05')

            c) Cast to the lower resolution
            d) (the current choice) Cast to the lower resolution *if* doing so is lossless

                In [3]: ts = pd.Timestamp("2016-01-01 02:03:04")._as_unit("s")
                In [4]: other = pd.Timestamp(500_000_000)._as_unit("ms")

                In [5]: ts - other
                [...]
                ValueError: Timestamp subtraction with mismatched resolutions is not allowed when casting to the lower resolution would require lossy rounding.

                i) ATM (2022-06-20 16:01 UTC) this check is in place for Timestamp.__sub__(datetimelike) but not for Timedelta. The intention is to implement it for all the relevant add/sub ts/td combinations.
                ii) ATM (2022-06-19 16:01 UTC) this check is not implemented for DatetimeArray or TimedeltaArray.
            
                iii) The main reason for choosing d) over a) is that a) would break user code if/when, as expected, we change constructor behavior for Timestamp(pydatetime_object) to have microsecond resolution, e.g:
                    In [3]: pd.Timestamp.now() - pd.Timestamp(datetime.now())

    - find_common_type with mixed resolution dt64/td64
        - numpy returns object
            In [19]: np.find_common_type([np.dtype("M8[ns]"), np.dtype("M8[s]")], [])
            Out[19]: dtype('O')

        - To be consistent with the arithmetic logic described above, we would want the common type to be the lower resolution if that cost is lossless, and object otherwise.
            This is value-dependent behavior that we otherwise try to avoid.

    - (Timestamp|Timedelta).(min|max|resolution)
        - ATM these are class attributes that assume Timestamp/Timedelta are always in nanoseconds, so will be incorrect for non-nano instances.
        - One solution is a descriptor that behaves differently depending on whether it is accessed on the class or an instance. e.g. `Timestamp.min` remains unchanged but `timestamp_instance_with_second_reso.min` returns the minimum second-resolution Timestamp.
        - Another solution would be for Timestamp.min to return a dict-like so you would do e.g. `Timestamp.min["ns"]`
Read more comments on GitHub >

github_iconTop Results From Across the Web

API/DES: Non-Nanosecond Tracker · pandas-dev ... - GitHub
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, ...
Read more >
Bug Report: Can't install homeassistant-supervised.deb
API/DES : Non-Nanosecond Tracker, 10, 2022-03-31, 2022-09-25. No way to change dwClsContext parameter to CoCreateInstance, 1, 2020-12-21, 2022-11-02.
Read more >
Untitled
Find a one-night stand or a hookup you can also hang out with. Which dating site is best for serious relationships? What is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found