Exception in interaction between `pyarrow.filesystem.S3FSWrapper` and `s3fs.core.S3FileSystem`
See original GitHub issueWhat happened: Using s3fs
with pyarrow
within petastorm
throws an exception (TypeError: 'coroutine' object is not iterable
) in multiple places that use pyarrow.parquet.ParquetDataset
(petastorm.reader.make_reader
, petastorm.reader.make_batch_reader
, etc.).
What you expected to happen: No exception
Minimal Complete Verifiable Example:
import pyarrow.parquet as pq
from s3fs import S3FileSystem
from pyarrow.filesystem import S3FSWrapper
fs = S3FSWrapper(S3FileSystem())
dataset_url = "s3://our-bucket/series/of/prefixes/partition1=foo/partition2=bar"
# This throws the relevant exception
dataset = pq.ParquetDataset(dataset_url, filesystem=fs)
./env/lib/python3.7/site-packages/pyarrow/filesystem.py:394: RuntimeWarning: coroutine 'S3FileSystem._ls' was never awaited
for key in list(self.fs._ls(path, refresh=refresh)):
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "./env/lib/python3.7/site-packages/pyarrow/parquet.py", line 1170, in __init__
open_file_func=partial(_open_dataset_file, self._metadata)
File "./env/lib/python3.7/site-packages/pyarrow/parquet.py", line 1348, in _make_manifest
metadata_nthreads=metadata_nthreads)
File "./env/lib/python3.7/site-packages/pyarrow/parquet.py", line 927, in __init__
self._visit_level(0, self.dirpath, [])
File "./env/lib/python3.7/site-packages/pyarrow/parquet.py", line 942, in _visit_level
_, directories, files = next(fs.walk(base_path))
File "./env/lib/python3.7/site-packages/pyarrow/filesystem.py", line 394, in walk
for key in list(self.fs._ls(path, refresh=refresh)):
TypeError: 'coroutine' object is not iterable
Anything else we need to know?: I’m having trouble navigating the bug-reporting process for Apache Arrow, if you’re able to pass this on to them.
Environment:
- Dask version:
0.5.1
- Python version:
3.7.8
- Operating System: Mac OS 10.15.6
- Install method (conda, pip, source):
pip install s3fs==0.5.1
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (4 by maintainers)
Top Results From Across the Web
Exception in interaction between pyarrow.filesystem ... - GitHub
What happened: Using s3fs with pyarrow within petastorm throws an exception (TypeError: 'coroutine' object is not iterable) in multiple ...
Read more >pyarrow.fs.S3FileSystem — Apache Arrow v10.0.1
Create a new FileSystem from URI or Path. Recognized URI schemes are “file”, “mock”, “s3fs”, “hdfs” and “viewfs”. In addition, the argument ...
Read more >ARROW-1213: [Python] Support s3fs filesystem for Amazon ...
HadoopFilesystem `` uses libhdfs, a JNI-based +interface to the Java ... import (ArrowException, ArrowTypeError) -from pyarrow.filesystem ...
Read more >Source code for s3fs.core - Read the Docs
[docs]class S3FileSystem(AsyncFileSystem): """ Access S3 as if it were a file system. This exposes a filesystem-like API (ls, cp, open, etc.) on top...
Read more >Apache Arrow 6.0.1 (2021-11-18)
ARROW-10921 - `TypeError: 'coroutine' object is not iterable` when reading parquet partitions via s3fs >= 0.5 with pyarrow ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You can turn on logging in s3fs to see what the calls are
On September 14, 2020 5:04:02 PM EDT, David McGuire notifications@github.com wrote:
– Sent from my Android device with K-9 Mail. Please excuse my brevity.
I’ll follow up with
petastorm
to have them remove the wrapper, ifpyarrow
isn’t going (/ doesn’t need to) maintain it.