question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan
>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64
>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64
>>>
>>> pd.to_numeric(pd.NaT, errors="coerce")
nan
>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64
>>> pd.Series([pd.NaT]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Problem description

When using pd.to_numeric to convert a pd.Series with dtype datetime64[ns], it returns different values than converting the series value by value

Expected Output

Converting a pd.Series as a whole should be the same than converting it value by value. I am not sure about what the correct output should be, but IMO the output should be consistent in these two scenarios.

What I suggest:

  • For no-null values, returns the same value. Maybe the integer?
  • For pd.NaT, always returns np.NaN

Output of pd.show_versions()

I am using the latest version of master until today

INSTALLED VERSIONS

commit : e39ea3024cebb4e7a7fd35972a44637de6c41650 python : 3.8.3.final.0 python-bits : 64 OS : Darwin OS-release : 19.6.0 Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+517.gc3761e24d8 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0.post20200714 Cython : 0.29.21 pytest : 5.4.3 hypothesis : None sphinx : 3.1.2 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.5.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fsspec : 0.7.4 fastparquet : None gcsfs : None matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : 1.3.18 tables : 3.6.1 tabulate : 0.8.9 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.50.1

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
hec10rcommented, Sep 1, 2021

Understood. Then let’s split the problem

  1. PR to fix behavior for scalar date-like objects, including pd.NaT (will do this before weekend)
  2. PR to fix behavior for iterables: infer types for list/tuples and use current approach for np.array when type is number-like. Approach for mixed types TBF (open to discuss and implement)
0reactions
mroeschkecommented, Sep 1, 2021

mapping this function element by element for iterables seems to be the easiest solution

This will kill performance for the existing cases, so that implementation is probably a non-starter.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.to_numeric — pandas 1.5.2 documentation
As this behaviour is separate from the core conversion to numeric values, any errors raised during the downcasting will be surfaced regardless of...
Read more >
What's new in 0.25.0 (July 18, 2019) - Joris Van den Bossche
When performing Index.union() operations between objects of incompatible dtypes, the result will be a base Index of dtype object . This behavior holds...
Read more >
v0.25.0 版本特性(2019年7月18日) - Pandas 中文
You have comma separated string in a column. In [12]: df = pd. ... The behavior of the sort parameter matches that of...
Read more >
python-pandas-0.22.0-bp150.2.6 - SUSE Package Hub -
Indexing + Bug in a boolean comparison of a datetime.datetime and a ... and a single bin (GH14652) + Bug in pd.to_numeric where...
Read more >
[Solved]-Pandas to_datetime has inconsistent behavior on ...
Using dayfirst in to_datetime european_dates = pd.Series(['05/04/2007', # <-- April 5th, 2007 '13/04/2006', # <-- April 13th, 2006 '27/12/2014', ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found