Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't load file from s3 bucket

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MAC 18.7.0
Modin version (modin.__version__): ‘0.7.4’
Python version: 3.7
Code we can use to reproduce:

import modin.pandas as mpd
import padnas as pd 

path = "s3://bucket_name/data/dataframe.snappy.parquet"
df = pd.read_parquet(path) # works 
df2 = mpd.read_parquet(path)

Describe the problem

I can’t load snappy.parquet files from s3. Pandas works fine. Does modin support snappy.parquet files?

Source code / logs

Error log:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-14-814dc08ef229> in <module>
----> 1 df2 = mpd.read_parquet(path)

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/pandas/io.py in read_parquet(path, engine, columns, **kwargs)
     40     return DataFrame(
     41         query_compiler=EngineDispatcher.read_parquet(
---> 42             path=path, columns=columns, engine=engine, **kwargs
     43         )
     44     )

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/data_management/dispatcher.py in read_parquet(cls, **kwargs)
    105     @classmethod
    106     def read_parquet(cls, **kwargs):
--> 107         return cls.__engine._read_parquet(**kwargs)
    108 
    109     @classmethod

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/data_management/factories.py in _read_parquet(cls, **kwargs)
     46     @classmethod
     47     def _read_parquet(cls, **kwargs):
---> 48         return cls.io_cls.read_parquet(**kwargs)
     49 
     50     @classmethod

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/engines/base/io/file_reader.py in read(cls, *args, **kwargs)
     27     @classmethod
     28     def read(cls, *args, **kwargs):
---> 29         query_compiler = cls._read(*args, **kwargs)
     30         # TODO (devin-petersohn): Make this section more general for non-pandas kernel
     31         # implementations.

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/modin/engines/base/io/column_stores/parquet_reader.py in _read(cls, path, engine, columns, **kwargs)
     68                 column_names = pd.schema.names
     69             else:
---> 70                 meta = ParquetFile(path).metadata
     71                 column_names = meta.schema.names
     72             if meta is not None:

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/parquet.py in __init__(self, source, metadata, common_metadata, read_dictionary, memory_map, buffer_size)
    135         self.reader.open(source, use_memory_map=memory_map,
    136                          buffer_size=buffer_size,
--> 137                          read_dictionary=read_dictionary, metadata=metadata)
    138         self.common_metadata = common_metadata
    139         self._nested_paths_by_prefix = self._build_nested_paths()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetReader.open()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.get_reader()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib._get_native_file()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.OSFile.__cinit__()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/io.pxi in pyarrow.lib.OSFile._open_readable()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

/opt/anaconda3/envs/myenv/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

FileNotFoundError: [Errno 2] Failed to open local file s3://bucket_name/data/dataframe.snappy.parquet'. Detail: [errno 2] No such file or directory