question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`DataFrame.diff` on `datetime64` returns result inconsistent with `Series.diff`

See original GitHub issue

Code Sample, a copy-pastable example if possible

>>> pd.Series(['NaT', '2019-01-01', '2019-01-02'], dtype='datetime64[ns]').diff()
0      NaT
1      NaT
2   1 days
dtype: timedelta64[ns]

>>> pd.Series(['NaT', '2019-01-01', '2019-01-02'], dtype='datetime64[ns]').to_frame().diff()
                             0
0                          NaT
1 -88855 days +00:12:43.145224
2              1 days 00:00:00

Problem description

Doing .diff() on a DataFrame of datetime64 should yield exactly time deltas in exactly the same way as on the Series i.e. index (1,0) in the DataFrame above should be NaT, not an overflown timedelta64.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.936

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0.post20200210 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
batterseapowercommented, Mar 8, 2020

@shang-vikas: looks like you found the bug, nice one.

I think the code you are referring to is https://github.com/pandas-dev/pandas/blob/706e6427bd1afd05c82eae810d288bbe22ed21c8/pandas/core/algorithms.py#L1900 (Series) and https://github.com/pandas-dev/pandas/blob/9e69040f199266cd7a743dfd2f579f271337687f/pandas/_libs/algos.pyx#L1193 (DataFrame). I’m sure the pandas project would be very happy to have a PR fixing the DataFrame case to match the Series one.

0reactions
jbrockmendelcommented, Sep 3, 2020

@shang-vikas pull request would be welcome. it sounds like you’ve already done most of the work by identifying the source of the problem

Read more comments on GitHub >

github_iconTop Results From Across the Web

pd.to_datetime producing inconsistent results for columns ...
I'm using pandas 1.0.3 with Python 3.7.7. ... Both datetimes are added to df_events so I can then compare the difference between the...
Read more >
Time series / date functionality
Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large ... When passed a Series , this returns a Series (with...
Read more >
Spark SQL, DataFrames and Datasets Guide
Specifying storage format for Hive tables; Interacting with Different Versions ... programming language the results will be returned as a Dataset/DataFrame.
Read more >
Using Pandas and Python to Explore Your Dataset
revenues.values returns the values in the Series , whereas revenues.index ... While Pandas builds on NumPy, a significant difference is in their indexing....
Read more >
[Solved]-How to deal with inconsistent date series in Python?
[Solved]-How to deal with inconsistent date series in Python?-Pandas,Python ... You can try different formats with strptime and return a DateTime object if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found