Can't load file from s3 bucket
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MAC 18.7.0
- Modin version (
modin.__version__
): ‘0.7.4’ - Python version: 3.7
- Code we can use to reproduce:
import modin.pandas as mpd
import padnas as pd
path = "s3://bucket_name/data/dataframe.snappy.parquet"
df = pd.read_parquet(path) # works
df2 = mpd.read_parquet(path)
Describe the problem
I can’t load snappy.parquet files from s3. Pandas works fine. Does modin support snappy.parquet files?
Source code / logs
Error log:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-14-814dc08ef229> in <module>
----> 1 df2 = mpd.read_parquet(path)
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/pandas/io.py in read_parquet(path, engine, columns, **kwargs)
40 return DataFrame(
41 query_compiler=EngineDispatcher.read_parquet(
---> 42 path=path, columns=columns, engine=engine, **kwargs
43 )
44 )
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/data_management/dispatcher.py in read_parquet(cls, **kwargs)
105 @classmethod
106 def read_parquet(cls, **kwargs):
--> 107 return cls.__engine._read_parquet(**kwargs)
108
109 @classmethod
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/data_management/factories.py in _read_parquet(cls, **kwargs)
46 @classmethod
47 def _read_parquet(cls, **kwargs):
---> 48 return cls.io_cls.read_parquet(**kwargs)
49
50 @classmethod
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/engines/base/io/file_reader.py in read(cls, *args, **kwargs)
27 @classmethod
28 def read(cls, *args, **kwargs):
---> 29 query_compiler = cls._read(*args, **kwargs)
30 # TODO (devin-petersohn): Make this section more general for non-pandas kernel
31 # implementations.
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/engines/base/io/column_stores/parquet_reader.py in _read(cls, path, engine, columns, **kwargs)
68 column_names = pd.schema.names
69 else:
---> 70 meta = ParquetFile(path).metadata
71 column_names = meta.schema.names
72 if meta is not None:
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/parquet.py in __init__(self, source, metadata, common_metadata, read_dictionary, memory_map, buffer_size)
135 self.reader.open(source, use_memory_map=memory_map,
136 buffer_size=buffer_size,
--> 137 read_dictionary=read_dictionary, metadata=metadata)
138 self.common_metadata = common_metadata
139 self._nested_paths_by_prefix = self._build_nested_paths()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetReader.open()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.get_reader()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib._get_native_file()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.OSFile.__cinit__()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.OSFile._open_readable()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
FileNotFoundError: [Errno 2] Failed to open local file s3://bucket_name/data/dataframe.snappy.parquet'. Detail: [errno 2] No such file or directory
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Resolve errors uploading data to or downloading data ... - AWS
Your file does not exist. Confirm that the file exists in your S3 bucket, and that the name you specified in your script...
Read more >Troubleshoot Amazon S3 content loading issue - AWS re:Post
I'm using an Amazon Simple Storage Service (Amazon S3) bucket to store content for my website. A user from another AWS account uploaded...
Read more >unable to read large csv file from s3 bucket to python
1 Answer 1 · Make sure the region of the S3 bucket is the same as your AWS configure. · Make sure the...
Read more >Can't download individual files from S3 bucket #13586
Connect to the S3 bucket · Select a directory/ folder · Right click > Download.
Read more >Bulk Loading from Amazon S3 - Snowflake Documentation
If the S3 bucket referenced by your external stage is in the same region as your Snowflake account, your network traffic does not...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@DenisVorotyntsev I am going to reopen this so we don’t lose track of it 😄
I’ll try to use
open_file
for parquet files.