Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_datetime() with errors=coerce and without return different values

See original GitHub issue

Code Sample, a copy-pastable example if possible

Without errors=‘coerce’

>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
                          ts
0     2019-02-02 08:07:13+00
1  2019-02-02 08:03:22.54+00
>>> df.dtypes
ts    object
dtype: object

>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True)
>>> df
                                ts
0        2019-02-02 08:07:13+00:00
1 2019-02-02 08:03:22.540000+00:00
>>> df.dtypes
ts    datetime64[ns, UTC]
dtype: object

With errors=‘coerce’

>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
                          ts
0     2019-02-02 08:07:13+00
1  2019-02-02 08:03:22.54+00
>>> df.dtypes
ts    object
dtype: object
>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True, errors='coerce')
>>> df
                   ts
0 2019-02-02 08:07:13
1                 NaT

Problem description

The functionality of to_datetime() with errors=‘coerce’ is different than without. If I understand some of the other issues raised on this topic correctly, the functionality is different in some cases by design. In this case, the dates are very similiar, although different format.

Expected Output

>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
                          ts
0     2019-02-02 08:07:13+00
1  2019-02-02 08:03:22.54+00
>>> df.dtypes
ts    object
dtype: object

>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True, errors='coerce')
>>> df
                                ts
0        2019-02-02 08:07:13+00:00
1 2019-02-02 08:03:22.540000+00:00
>>> df.dtypes
ts    datetime64[ns, UTC]
dtype: object

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.1 pytest: None pip: 10.0.1 setuptools: 39.0.1 Cython: None numpy: 1.14.2 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.4 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 3.8.0 bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.5 (dt dec pq3 ext lo64) jinja2: 2.8.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

State:
Created 5 years ago
Comments:14 (6 by maintainers)

Top GitHub Comments

1reaction

MarcoGorellicommented, Dec 19, 2022

That was a bug, and was fixed as part of the PDEP4 change

On the main branch, you’d get an error:

In [3]: pandas.to_datetime(['01-May-2021 00:00:00', '01-Sep-2021 00:00:00'], infer_datetime_format=True, errors="raise")
<ipython-input-3-08582d0c0651>:1: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  pandas.to_datetime(['01-May-2021 00:00:00', '01-Sep-2021 00:00:00'], infer_datetime_format=True, errors="raise")
---------------------------------------------------------------------------
ValueError: time data '01-Sep-2021 00:00:00' does not match format '%d-%B-%Y %H:%M:%S' (match)

This behaviour will be available in pandas 2.0.0, which’ll hopefully come out around February

0reactions

evgeniikozlovcommented, Dec 19, 2022

@MarcoGorelli thanks for the explanation about “May” parsing, but it is still unclear why it works if errors="raise" (my second example). It is confusing a lot why errors option has any affect on parsing in this case.

Top Results From Across the Web

pandas to_datetime but replace with fixed value when fail ...

Replace missing values by some default datetime value in Series.mask only for missing values generated by to_datetime with errors='coerce' : df=pd.

pandas.to_datetime — pandas 1.5.2 documentation

If True , the function always returns a timezone-aware UTC-localized Timestamp , Series or DatetimeIndex . To do this, timezone-naive inputs are localized...

10 Tricks for Converting Numbers and Strings to Datetime in ...

When converting a column with missing values to datetime, both to_datetime() and astype() are changing Numpy's NaN to Pandas' NaT and this ...

Pandas Convert Multiple Columns To DateTime Type

to_datetime(), DataFrame.apply() & astype() functions. Usually, we get Data & time from the sources in different formats and in different data types, by...

todatetime() - Azure Data Explorer - Microsoft Learn

Learn how to use the avgif() function to return the average value of an expression where the predicate evaluates to true. min_of() -...