RayTaskError while using pd.read_feather()
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux watsnet 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Modin version (
modin.__version__
):0.8.20.10.1 - Python version: Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20) [GCC 9.3.0] on linux
- Code we can use to reproduce:
- Download _default_rule_obj.feather
Import modin.pandas as pd pd.read_feather('_default_rule_obj.feather')
Describe the problem
Source code / logs
@pytest.mark.dependency(depends=["test_apply_default_rule"])
def test_apply_within_chapter_rule() -> None:
config = test_load_config()
with open(test_data_obj_path, 'rb') as _input1:
data_obj = pickle.load(_input1)
x_df = pd.read_feather(test_default_rule_obj_path)
tests/test_pipeline.py:168:
…/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/pandas/io.py:309: in read_feather return DataFrame(query_compiler=EngineDispatcher.read_feather(**kwargs)) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/data_management/factories/dispatcher.py:132: in read_feather return cls.__engine._read_feather(**kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/data_management/factories/factories.py:115: in _read_feather return cls.io_cls.read_feather(**kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/file_reader.py:29: in read query_compiler = cls._read(*args, **kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/feather_reader.py:38: in _read return cls.build_query_compiler(path, df.columns, use_threads=False) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/column_store_reader.py:111: in build_query_compiler index, row_lens = cls.build_index(partition_ids) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/column_store_reader.py:62: in build_index index_len = cls.materialize(partition_ids[-2][0]) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py:29: in materialize return ray.get(obj_id)
object_refs = [ObjectRef(a49175458b016c38ffffffff0100000005000000)]
def get(object_refs, *, timeout=None):
"""Get a remote object or a list of remote objects from the object store.
This method blocks until the object corresponding to the object ref is
available in the local object store. If this object is not in the local
object store, it will be shipped from an object store that has it (once the
object has been created). If object_refs is a list, then the objects
corresponding to each object in the list will be returned.
This method will issue a warning if it's running inside async context,
you can use ``await object_ref`` instead of ``ray.get(object_ref)``. For
a list of object refs, you can use ``await asyncio.gather(*object_refs)``.
Args:
object_refs: Object ref of the object to get or a list of object refs
to get.
timeout (Optional[float]): The maximum amount of time in seconds to
wait before returning.
Returns:
A Python object or a list of Python objects.
Raises:
GetTimeoutError: A GetTimeoutError is raised if a timeout is set and
the get takes longer than timeout to return.
Exception: An exception is raised if the task that created the object
or that created one of the objects raised an exception.
"""
worker = global_worker
worker.check_connected()
if hasattr(
worker,
"core_worker") and worker.core_worker.current_actor_is_asyncio():
global blocking_get_inside_async_warned
if not blocking_get_inside_async_warned:
logger.warning("Using blocking ray.get inside async actor. "
"This blocks the event loop. Please use `await` "
"on object ref with asyncio.gather if you want to "
"yield execution to the event loop instead.")
blocking_get_inside_async_warned = True
with profiling.profile("ray.get"):
is_individual_id = isinstance(object_refs, ray.ObjectRef)
if is_individual_id:
object_refs = [object_refs]
if not isinstance(object_refs, list):
raise ValueError("'object_refs' must either be an object ref "
"or a list of object refs.")
global last_task_error_raise_time
# TODO(ujvl): Consider how to allow user to retrieve the ready objects.
values, debugger_breakpoint = worker.get_objects(
object_refs, timeout=timeout)
for i, value in enumerate(values):
if isinstance(value, RayError):
last_task_error_raise_time = time.time()
if isinstance(value, ray.exceptions.ObjectLostError):
worker.core_worker.dump_object_store_memory_usage()
if isinstance(value, RayTaskError):
raise value.as_instanceof_cause()
E ray.exceptions.RayTaskError(ValueError): ray::deploy_ray_func() (pid=26838, ip=192.168.0.115) E File “python/ray/_raylet.pyx”, line 463, in ray._raylet.execute_task E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py”, line 19, in deploy_ray_func E return func(**args) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/backends/pandas/parsers.py”, line 404, in parse E df = feather.read_feather(fname, **kwargs) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/pyarrow/feather.py”, line 215, in read_feather E return (read_table(source, columns=columns, memory_map=memory_map) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/pyarrow/feather.py”, line 257, in read_table E elif sorted(set(columns)) == columns: E ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
…/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/ray/worker.py:1379: RayTaskError(ValueError)
My pip list shows:
Package Version
aiohttp 3.7.3 aiohttp-cors 0.7.0 aioredis 1.3.1 appdirs 1.4.4 argon2-cffi 20.1.0 arrow 0.17.0 asciitree 0.3.3 async-generator 1.10 async-timeout 3.0.1 attrs 20.3.0 backcall 0.2.0 backports.functools-lru-cache 1.6.1 binaryornot 0.4.4 bleach 3.2.1 blessings 1.7 boto3 1.16.53 botocore 1.19.53 brotlipy 0.7.0 build 0.1.0 cachetools 4.2.0 certifi 2020.12.5 cffi 1.14.4 chardet 3.0.4 check-manifest 0.45 click 8.0.1 cloudpickle 1.6.0 colorama 0.4.4 colorful 0.5.4 cookiecutter 1.7.2 coverage 5.3.1 cryptography 3.3.1 cyber-utils-abneet-wats 0.5.2 dask 2.19.0 decorator 4.4.2 decouple 0.0.7 defusedxml 0.6.0 distlib 0.3.1 distributed 2.19.0 entrypoints 0.3 fasteners 0.14.1 filelock 3.0.12 google-api-core 1.24.1 google-auth 1.24.0 googleapis-common-protos 1.52.0 gpustat 0.6.0 grpcio 1.34.0 HeapDict 1.0.1 hiredis 1.1.0 idna 2.10 importlib-metadata 2.1.1 iniconfig 1.1.1 ipykernel 5.4.2 ipython 7.19.0 ipython-genutils 0.2.0 ipywidgets 7.6.2 jedi 0.18.0 Jinja2 2.11.2 jinja2-time 0.2.0 jmespath 0.10.0 joblib 1.0.0 jsonschema 3.2.0 jupyter-client 6.1.7 jupyter-console 6.2.0 jupyter-core 4.7.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 MarkupSafe 1.1.1 mistune 0.8.4 modin 0.10.1 monotonic 1.5 msgpack 1.0.2 multidict 5.1.0 nbclient 0.5.1 nbconvert 6.0.7 nbformat 5.0.8 nest-asyncio 1.4.3 notebook 6.1.6 numcodecs 0.7.2 numpy 1.19.4 nvidia-ml-py3 7.352.0 opencensus 0.7.11 opencensus-context 0.1.2 packaging 20.8 pandas 1.3.0 pandocfilters 1.4.2 parso 0.8.1 pbr 5.5.1 pep517 0.9.1 pexpect 4.8.0 pickleshare 0.7.5 pip 20.3.3 pip-tools 5.5.0 pluggy 0.13.1 poyo 0.5.0 prometheus-client 0.9.0 prompt-toolkit 3.0.8 protobuf 3.17.3 psutil 5.8.0 ptyprocess 0.6.0 py 1.10.0 py-spy 0.3.3 pyarrow 1.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.20 pydantic 1.8.2 Pygments 2.7.3 pyOpenSSL 20.0.1 pyparsing 2.4.7 PyQt5 5.12.3 PyQt5-sip 4.19.18 PyQtChart 5.12 PyQtWebEngine 5.12.1 pyrsistent 0.17.3 PySocks 1.7.1 pytest 6.2.1 pytest-dependency 0.5.1 python-dateutil 2.8.1 python-slugify 4.0.1 pytz 2020.5 PyYAML 5.3.1 pyzmq 20.0.0 qtconsole 5.0.1 QtPy 1.9.0 ray 1.4.1 redis 3.5.3 requests 2.25.1 rsa 4.6 s3transfer 0.3.4 scikit-learn 0.24.0 scipy 1.6.0 Send2Trash 1.5.0 setuptools 49.6.0.post20201009 six 1.15.0 sniffio 1.2.0 sortedcontainers 2.3.0 tblib 1.7.0 terminado 0.9.1 testpath 0.4.4 text-unidecode 1.3 threadpoolctl 2.1.0 toml 0.10.2 toolz 0.11.1 tornado 6.1 tox 3.20.1 traitlets 5.0.5 typing-extensions 3.7.4.3 urllib3 1.26.2 virtualenv 20.2.2 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.36.2 widgetsnbextension 3.5.1 xgboost 1.3.1 yarl 1.6.3 zarr 2.6.1 zict 2.0.0 zipp 3.4.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (5 by maintainers)
It seems that depends on OS. I have just ran reading the file on Windows and got the same error. We will take a look at it.
This is slotted for the next release. Thanks everyone for adding context here.