DEPR: parse dates with `pyarrow` when `engine='pyarrow'` is specified
See original GitHub issuexref https://github.com/pandas-dev/pandas/pull/47962#issuecomment-1209776265
Currently, the date parsing is done by pandas after pyarrow
has read the input in - the date parsing should be done by pyarrow
, and pandas should let an error be thrown if pyarrow
can’t parse a date
This would cause some breakage (see the linked discussion for an example), and so we should go through a deprecation cycle first
This would involve adding a FutureWarning
to the function _do_date_conversions
called in https://github.com/pandas-dev/pandas/blob/7214f857e45965fe7a0cfbba1d2dc35abb0fd7f4/pandas/io/parsers/arrow_parser_wrapper.py#L111 saying that in the future, date parsing will be handed off to pyarrow
itself and therefore some unusual formats currently handled by pandas
might not continue to work
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
yep suooort argument makes sense +1 on proceeding for marco summary
Yeah I think maintaining logic for a
pyarrow behavior -> pandas behavior
for csv parsing (not just date parsing) is going to be a very large undertaking. I think the compatibility logic would eventually need to compensate forGenerally, my philosophy (opinion) of incorporating pyarrow into pandas has generally been “the user should be getting pyarrow behavior but returned as pandas data structures”. This is what I’ve been aiming towards with the
ArrowExtensionArray
, but open to being wrong about this thinking.