pd.to_datetime() throws if caching is on with Null-like arguments
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
result = pd.to_datetime([pd.NaT, None], cache=True)
Problem description
It results in error:
… ~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance) 3242 3243 if not self.is_unique: -> 3244 raise InvalidIndexError(‘Reindexing only valid with uniquely’ 3245 ’ valued Index objects’) 3246
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Expected Output
The same as result = pd.to_datetime([pd.NaT, None],cache=False)
:
DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)
Output of pd.show_versions()
pandas: 0.23.4
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.28.3
numpy: 1.13.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None
[paste the output of pd.show_versions()
here below this line]
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
pls raise a new issue with the example
Hello guys,
It looks like this bug is back in business in the latest version, but a bit harder to trigger:
pandas versions
>>> pd.show_versions()INSTALLED VERSIONS
commit : f2ca0a2665b2d169c97de87b8e778dbed86aea07 python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.7.15-200.fc32.x86_64 Version : #1 SMP Tue Aug 11 16:36:14 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 1.1.1 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.10.0 psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.19 tables : None tabulate : 0.8.7 xarray : None xlrd : None xlwt : None numba : None
How to reproduce:
The key here is to have enough entries in the Series to trigger the caching system.