BUG: Calling to_datetime With errrors="coerce" Still Raises After 50 Rows
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
from datetime import datetime
series = pd.Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")] +
([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * 50)
)
pd.to_datetime(series, errors="coerce", utc=True)
Issue Description
Calling to_datetime
on 51 or more rows of datetime
values causes these nested errors:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
and
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1446-04-12 00:00:00
This is despite being called with errors="coerce"
and utc=True
.
These errors do not appear if:
- There are <= 50 rows in the series
- The objects are typed differently before
to_datetime
is called:pd.to_datetime(series.astype("str"), errors="coerce", utc=True)
Note that even a simpler call like series.value_counts()
produces the same error.
Expected Behavior
51 rows with the 1991 date converted to datetime and the 1446 date converted to NaT
.
Installed Versions
1.4.0rc0:
INSTALLED VERSIONS
commit : d023ba755322e09b95fd954bbdc43f5be224688e python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 126 Stepping 5, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.4.0rc0 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.3 setuptools : 57.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None zstandard : None
1.3.5:
INSTALLED VERSIONS
commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 126 Stepping 5, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.3.5 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.3 setuptools : 57.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
@jeremytwfortune issues are solved with pull requests from the community
take