Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DEPR: parse dates with `pyarrow` when `engine='pyarrow'` is specified

See original GitHub issue

xref https://github.com/pandas-dev/pandas/pull/47962#issuecomment-1209776265

Currently, the date parsing is done by pandas after pyarrow has read the input in - the date parsing should be done by pyarrow, and pandas should let an error be thrown if pyarrow can’t parse a date

This would cause some breakage (see the linked discussion for an example), and so we should go through a deprecation cycle first

This would involve adding a FutureWarning to the function _do_date_conversions called in https://github.com/pandas-dev/pandas/blob/7214f857e45965fe7a0cfbba1d2dc35abb0fd7f4/pandas/io/parsers/arrow_parser_wrapper.py#L111 saying that in the future, date parsing will be handed off to pyarrow itself and therefore some unusual formats currently handled by pandas might not continue to work

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

jrebackcommented, Aug 11, 2022

yep suooort argument makes sense +1 on proceeding for marco summary

1reaction

mroeschkecommented, Aug 10, 2022

Yeah I think maintaining logic for a pyarrow behavior -> pandas behavior for csv parsing (not just date parsing) is going to be a very large undertaking. I think the compatibility logic would eventually need to compensate for

Different pyarrow versions
Value dependent behavior (i.e. knowing the input value, what pyarrow returned, and what pandas “should” return)
The combination of the above points

Generally, my philosophy (opinion) of incorporating pyarrow into pandas has generally been “the user should be getting pyarrow behavior but returned as pandas data structures”. This is what I’ve been aiming towards with the ArrowExtensionArray, but open to being wrong about this thinking.