to_datetime function not working with %Y.%m.%d %H format
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import datetime as dt
print(dt.datetime.strptime('2012.01.01 1', '%Y.%m.%d %H'))
# -> 2012-01-01 01:00:00
print(pd.to_datetime('2012.01.01 1', format='%Y.%m.%d %H'))
# -> ValueError: time data '2012.01.01 1' doesn't match format specified
Problem description
When using to_datetime
to parse a date that only includes an hour component, but not minutes and seconds, with a format that is otherwise similar to ISO8601 (such as the format '%Y.%m.%d %H'
), a ValueError
is raised (see above). This behavior is unexpected as strptime
can parse the same date without any problem, using the same format string (see above).
I suspect that the problem is in the _format_is_iso
function of pandas._libs.tslibs.parsing
, where it is just checked if the ISO format starts with the format given - so this format is recognized as being ISO-like. In this case, the format passed to to_datetime
is ignored and tslib.array_to_datetime
function is used to parse the date instead, which doesn’t seem to be able to handle this kind of format.
My current workaround is to modify the dates to also have a minutes component (append ':00'
to every string) so that they can be parsed.
Expected Output
2012-01-01 01:00:00
(same as when using strptime
)
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-6-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8
pandas: 0.22.0 pytest: None pip: 9.0.1 setuptools: 38.5.2 Cython: None numpy: 1.14.2 scipy: 0.19.0 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.2 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.0 openpyxl: 2.5.0 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (12 by maintainers)
To me it’s only a question of whether we consider
"2016.05.06 1"
an ISO 8601 (loosely as we use it) formatted string, i.e. parses even if no format is passed.Even if the answer to that is no, we need to match python if an explicit format spec is passed, so some change is necessary either way. It’s a bug for
pd.to_datetime(..., format='format')
not to matchdatetime.strptime(..., 'format')
This’ll be addressed in https://github.com/pandas-dev/pandas/pull/50242
thanks for the report!