question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DEPR: parse dates with `pyarrow` when `engine='pyarrow'` is specified

See original GitHub issue

xref https://github.com/pandas-dev/pandas/pull/47962#issuecomment-1209776265

Currently, the date parsing is done by pandas after pyarrow has read the input in - the date parsing should be done by pyarrow, and pandas should let an error be thrown if pyarrow can’t parse a date

This would cause some breakage (see the linked discussion for an example), and so we should go through a deprecation cycle first

This would involve adding a FutureWarning to the function _do_date_conversions called in https://github.com/pandas-dev/pandas/blob/7214f857e45965fe7a0cfbba1d2dc35abb0fd7f4/pandas/io/parsers/arrow_parser_wrapper.py#L111 saying that in the future, date parsing will be handed off to pyarrow itself and therefore some unusual formats currently handled by pandas might not continue to work

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Aug 11, 2022

yep suooort argument makes sense +1 on proceeding for marco summary

1reaction
mroeschkecommented, Aug 10, 2022

Yeah I think maintaining logic for a pyarrow behavior -> pandas behavior for csv parsing (not just date parsing) is going to be a very large undertaking. I think the compatibility logic would eventually need to compensate for

  • Different pyarrow versions
  • Value dependent behavior (i.e. knowing the input value, what pyarrow returned, and what pandas “should” return)
  • The combination of the above points

Generally, my philosophy (opinion) of incorporating pyarrow into pandas has generally been “the user should be getting pyarrow behavior but returned as pandas data structures”. This is what I’ve been aiming towards with the ArrowExtensionArray, but open to being wrong about this thinking.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pyarrow.compute.strptime — Apache Arrow v10.0.1
Parse timestamps. For each string in strings , parse it as a timestamp. The timestamp unit and the expected string pattern must be...
Read more >
pyarrow.timestamp — Apache Arrow v10.0.1
Create instance of timestamp type with resolution and optional time zone. Parameters: unit str. one of 's' [second], ...
Read more >
Pandas Integration — Apache Arrow v10.0.1
While dates can be handled using the datetime64[ns] type in pandas, some systems ... that will return the pandas data type to use...
Read more >
Tabular Datasets — Apache Arrow v10.0.1
ParquetDataset for reading Parquet datasets: pyarrow.dataset 's goal is similar but not specific to the Parquet format and not tied to Python: the...
Read more >
Reading and Writing the Apache Parquet Format
We do not need to use a string to specify the origin of the file. It can be any of: A file path...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found