question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CI/BUG: pyarrow read_csv deadlock

See original GitHub issue

xref #43611, #43643

When trying to figure out azure timeout issues, deadlock appeared to be occurring in parser code, so pyarrow makes sense as the culprit. Seems like tests with weird input cause issues, for example some of the parse_dates tests, or for a specific reproducer the test:

pandas/tests/io/parser/common/test_ints.py::test_outside_int64_uint64_range

On current pyarrow I can’t reproduce, but azure uses 0.17.0, with which can reproduce a deadlock (just running the command pandas/tests/io/parser/common/test_ints.py::test_outside_int64_uint64_range) on macOS. Doesn’t happen consistently, but will deadlock (to the point that need to sigkill to stop, which explains why pytest-timeout didn’t catch it).

cc @lithomas1 if any thoughts here

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
lithomas1commented, Sep 21, 2021

I’ll try to test this some more. It is also possible that pyarrow is getting stuck because of the TextIoWrapper that we are using on our side to force pyarrow to read StringIO’s, which would be a bug on our side.

0reactions
mroeschkecommented, Nov 23, 2022

Now that our minimum version is 6.0 I believe we shouldn’t hit this issue anyone as IIRC I was experiencing this with pyarrow 2.0 and had skipped those version in the CI due to the deadlock.

Closing since we haven’t seen this in a while but we can reopen if this shows up again

Read more comments on GitHub >

github_iconTop Results From Across the Web

pyarrow.csv.read_csv — Apache Arrow v10.0.1
Read a Table from a stream of CSV data. Parameters: input_file str , path or file-like object. The location of CSV ...
Read more >
pyarrow.csv.ReadOptions — Apache Arrow v10.0.1
pyarrow.csv.ReadOptions¶ · Whether to use multiple threads to accelerate reading · How much bytes to process at a time from the input stream....
Read more >
Reading and Writing CSV files — Apache Arrow v10.0.1
Reading and Writing CSV files¶. Arrow supports reading and writing columnar data from/to CSV files. The features currently offered are the following:.
Read more >
pyarrow.csv.ParseOptions — Apache Arrow v10.0.1
Parameters: delimiter1-character str , optional (default ','). The character delimiting individual cells in the CSV data.
Read more >
pyarrow.csv.ConvertOptions — Apache Arrow v10.0.1
Explicitly map column names to column types. Passing this argument disables type inference on the defined columns. null_values list , optional. A sequence...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found