to_datetime() with errors=coerce and without return different values
See original GitHub issueCode Sample, a copy-pastable example if possible
Without errors=‘coerce’
>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
ts
0 2019-02-02 08:07:13+00
1 2019-02-02 08:03:22.54+00
>>> df.dtypes
ts object
dtype: object
>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True)
>>> df
ts
0 2019-02-02 08:07:13+00:00
1 2019-02-02 08:03:22.540000+00:00
>>> df.dtypes
ts datetime64[ns, UTC]
dtype: object
With errors=‘coerce’
>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
ts
0 2019-02-02 08:07:13+00
1 2019-02-02 08:03:22.54+00
>>> df.dtypes
ts object
dtype: object
>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True, errors='coerce')
>>> df
ts
0 2019-02-02 08:07:13
1 NaT
Problem description
The functionality of to_datetime() with errors=‘coerce’ is different than without. If I understand some of the other issues raised on this topic correctly, the functionality is different in some cases by design. In this case, the dates are very similiar, although different format.
Expected Output
>>> d = { 'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00'] }
>>> df = pd.DataFrame(data=d)
>>> d
{'ts': ['2019-02-02 08:07:13+00', '2019-02-02 08:03:22.54+00']}
>>> df
ts
0 2019-02-02 08:07:13+00
1 2019-02-02 08:03:22.54+00
>>> df.dtypes
ts object
dtype: object
>>> df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True, errors='coerce')
>>> df
ts
0 2019-02-02 08:07:13+00:00
1 2019-02-02 08:03:22.540000+00:00
>>> df.dtypes
ts datetime64[ns, UTC]
dtype: object
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.24.1 pytest: None pip: 10.0.1 setuptools: 39.0.1 Cython: None numpy: 1.14.2 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.4 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 3.8.0 bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.5 (dt dec pq3 ext lo64) jinja2: 2.8.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (6 by maintainers)
That was a bug, and was fixed as part of the PDEP4 change
On the main branch, you’d get an error:
This behaviour will be available in pandas 2.0.0, which’ll hopefully come out around February
@MarcoGorelli thanks for the explanation about “May” parsing, but it is still unclear why it works if
errors="raise"
(my second example). It is confusing a lot whyerrors
option has any affect on parsing in this case.