question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deadlock in the interaction between `pyarrow.filesystem.S3FSWrapper` and `s3fs.core.S3FileSystem`

See original GitHub issue

Please be concise with code posted. See guidelines below on how to provide a good bug report:

Bug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly. –>

What happened: Some interaction between s3fs, pyarrow, and petastorm causes deadlock

What you expected to happen: s3fs to be threadsafe, if pyarrow is using it that way

Minimal Complete Verifiable Example:

import pyarrow.parquet as pq
from petastorm.fs_utils import get_filesystem_and_path_or_paths, normalize_dir_url

dataset_url = 's3://<redacted>'

# Repeat basic steps that make_reader or make_batch_reader normally does
dataset_url = normalize_dir_url(dataset_url)
fs, path = get_filesystem_and_path_or_paths(dataset_url)

# Finished in seconds
dataset = pq.ParquetDataset(path, filesystem=fs, metadata_nthreads=1)
# Hung all night
dataset = pq.ParquetDataset(path, filesystem=fs, metadata_nthreads=10)

# Their code
>>> type(fs)
<class 'pyarrow.filesystem.S3FSWrapper'>
# Your code
>>> type(fs.fs)
<class 's3fs.core.S3FileSystem'>

Anything else we need to know?:

If your code is not threadsafe, that would appear to be news to pyarrow. Also reported to Petastorm. Will be reported to PyArrow.

Environment:

  • Dask version: 0.4.2
  • Python version: 3.7.8
  • Operating System: Mac OS 10.15.6
  • Install method (conda, pip, source): pip install s3fs==0.4.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:22 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
dmcguire81commented, Sep 17, 2020

Reported as ARROW-10029.

0reactions
jorisvandenbosschecommented, Sep 17, 2020

Cool, thanks for further looking into it and figuring it out @dmcguire81 !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deadlock in the interaction between `pyarrow.filesystem. ...
S3FSWrapper and s3fs.core.S3FileSystem #365 ... What happened: Some interaction between s3fs , pyarrow , and petastorm causes deadlock.
Read more >
[Python] Deadlock in the interaction of pyarrow FileSystem ...
@martindurant good news (for you): I have a repro test case that is 100% pyarrow, so it looks like s3fs is not involved....
Read more >
Apache Arrow 6.0.1 (2021-11-18)
ARROW-10921 - `TypeError: 'coroutine' object is not iterable` when reading parquet partitions via s3fs >= 0.5 with pyarrow ...
Read more >
Source - GitHub
... [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems ... [C++] Fix copying objects with special characters on S3FS ...
Read more >
Apache Arrow 3.0.0 Release
... split packages in arrow-memory-core and arrow-vectors ARROW-10345 ... [Python] pyarrow doesn't work with s3fs>=0.5 ARROW-10434 - [Rust] ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found