Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read_parquet is not supported for partitioned parquet

See original GitHub issue

Split from #626

The read_parquet is not supported for a partitioned data set.

System information

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
$ conda --version
conda 4.6.14
$ python --version
Python 3.7.3
$ pip --version
pip 19.1 from /home/dlweber/miniconda3/envs/gis-dataprocessing/lib/python3.7/site-packages/pip (python 3.7)

$ pip freeze | grep modin
modin==0.5.0
$ pip freeze | grep pandas
pandas==0.24.2
$ pip freeze | grep numpy
numpy==1.16.3

miniconda3 was used to install most of the sci-py stack, with a pip clause to add modin, e.g.

# environment.yaml
channels:
  - conda-forge
  - defaults

dependencies:
  - python>=3.7
  - affine
  - configobj
  - dask
  - numpy
  - pandas
  - pyarrow
  - rasterio
  - s3fs
  - scikit-learn
  - scipy
  - shapely
  - xarray
  - pip
  - pip:
    - modin

Describe the problem

https://modin.readthedocs.io/en/latest/pandas_supported.html says read_parquet is supported, but maybe not for partitioned data.

error

  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 871, in _full_reduce
    mapped_parts = self.data.map_across_blocks(map_func)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 209, in map_across_blocks
    preprocessed_map_func = self.preprocess_func(map_func)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 100, in preprocess_func
    return self._partition_class.preprocess_func(map_func)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/partition.py", line 108, in preprocess_func
    return ray.put(func)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 2216, in put
    worker.put_object(object_id, value)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 375, in put_object
    self.store_and_register(object_id, value)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 309, in store_and_register
    self.task_driver_id))
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/utils.py", line 475, in _wrapper
    return orig_attr(*args, **kwargs)
  File "pyarrow/_plasma.pyx", line 496, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 355, in pyarrow.lib.serialize
  File "pyarrow/serialization.pxi", line 150, in pyarrow.lib.SerializationContext._serialize_callback
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle.py", line 952, in dumps
    cp.dump(obj)
  File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle.py", line 271, in dump
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.