question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RayTaskError while using pd.read_feather()

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux watsnet 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Modin version (modin.__version__): 0.8.2 0.10.1
  • Python version: Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20) [GCC 9.3.0] on linux
  • Code we can use to reproduce:
    Import modin.pandas as pd
    pd.read_feather('_default_rule_obj.feather')
    

Describe the problem

Source code / logs

@pytest.mark.dependency(depends=["test_apply_default_rule"])
def test_apply_within_chapter_rule() -> None:
    config = test_load_config()
    with open(test_data_obj_path, 'rb') as _input1:
        data_obj = pickle.load(_input1)
  x_df = pd.read_feather(test_default_rule_obj_path)

tests/test_pipeline.py:168:


…/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/pandas/io.py:309: in read_feather return DataFrame(query_compiler=EngineDispatcher.read_feather(**kwargs)) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/data_management/factories/dispatcher.py:132: in read_feather return cls.__engine._read_feather(**kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/data_management/factories/factories.py:115: in _read_feather return cls.io_cls.read_feather(**kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/file_reader.py:29: in read query_compiler = cls._read(*args, **kwargs) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/feather_reader.py:38: in _read return cls.build_query_compiler(path, df.columns, use_threads=False) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/column_store_reader.py:111: in build_query_compiler index, row_lens = cls.build_index(partition_ids) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/base/io/column_stores/column_store_reader.py:62: in build_index index_len = cls.materialize(partition_ids[-2][0]) …/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py:29: in materialize return ray.get(obj_id)


object_refs = [ObjectRef(a49175458b016c38ffffffff0100000005000000)]

def get(object_refs, *, timeout=None):
    """Get a remote object or a list of remote objects from the object store.

    This method blocks until the object corresponding to the object ref is
    available in the local object store. If this object is not in the local
    object store, it will be shipped from an object store that has it (once the
    object has been created). If object_refs is a list, then the objects
    corresponding to each object in the list will be returned.

    This method will issue a warning if it's running inside async context,
    you can use ``await object_ref`` instead of ``ray.get(object_ref)``. For
    a list of object refs, you can use ``await asyncio.gather(*object_refs)``.

    Args:
        object_refs: Object ref of the object to get or a list of object refs
            to get.
        timeout (Optional[float]): The maximum amount of time in seconds to
            wait before returning.

    Returns:
        A Python object or a list of Python objects.

    Raises:
        GetTimeoutError: A GetTimeoutError is raised if a timeout is set and
            the get takes longer than timeout to return.
        Exception: An exception is raised if the task that created the object
            or that created one of the objects raised an exception.
    """
    worker = global_worker
    worker.check_connected()

    if hasattr(
            worker,
            "core_worker") and worker.core_worker.current_actor_is_asyncio():
        global blocking_get_inside_async_warned
        if not blocking_get_inside_async_warned:
            logger.warning("Using blocking ray.get inside async actor. "
                           "This blocks the event loop. Please use `await` "
                           "on object ref with asyncio.gather if you want to "
                           "yield execution to the event loop instead.")
            blocking_get_inside_async_warned = True

    with profiling.profile("ray.get"):
        is_individual_id = isinstance(object_refs, ray.ObjectRef)
        if is_individual_id:
            object_refs = [object_refs]

        if not isinstance(object_refs, list):
            raise ValueError("'object_refs' must either be an object ref "
                             "or a list of object refs.")

        global last_task_error_raise_time
        # TODO(ujvl): Consider how to allow user to retrieve the ready objects.
        values, debugger_breakpoint = worker.get_objects(
            object_refs, timeout=timeout)
        for i, value in enumerate(values):
            if isinstance(value, RayError):
                last_task_error_raise_time = time.time()
                if isinstance(value, ray.exceptions.ObjectLostError):
                    worker.core_worker.dump_object_store_memory_usage()
                if isinstance(value, RayTaskError):
                  raise value.as_instanceof_cause()

E ray.exceptions.RayTaskError(ValueError): ray::deploy_ray_func() (pid=26838, ip=192.168.0.115) E File “python/ray/_raylet.pyx”, line 463, in ray._raylet.execute_task E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py”, line 19, in deploy_ray_func E return func(**args) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/modin/backends/pandas/parsers.py”, line 404, in parse E df = feather.read_feather(fname, **kwargs) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/pyarrow/feather.py”, line 215, in read_feather E return (read_table(source, columns=columns, memory_map=memory_map) E File “/home/abneet/miniconda3/envs/cyber/lib/python3.7/site-packages/pyarrow/feather.py”, line 257, in read_table E elif sorted(set(columns)) == columns: E ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

…/…/…/miniconda3/envs/cyber/lib/python3.7/site-packages/ray/worker.py:1379: RayTaskError(ValueError)

My pip list shows:

Package Version


aiohttp 3.7.3 aiohttp-cors 0.7.0 aioredis 1.3.1 appdirs 1.4.4 argon2-cffi 20.1.0 arrow 0.17.0 asciitree 0.3.3 async-generator 1.10 async-timeout 3.0.1 attrs 20.3.0 backcall 0.2.0 backports.functools-lru-cache 1.6.1 binaryornot 0.4.4 bleach 3.2.1 blessings 1.7 boto3 1.16.53 botocore 1.19.53 brotlipy 0.7.0 build 0.1.0 cachetools 4.2.0 certifi 2020.12.5 cffi 1.14.4 chardet 3.0.4 check-manifest 0.45 click 8.0.1 cloudpickle 1.6.0 colorama 0.4.4 colorful 0.5.4 cookiecutter 1.7.2 coverage 5.3.1 cryptography 3.3.1 cyber-utils-abneet-wats 0.5.2 dask 2.19.0 decorator 4.4.2 decouple 0.0.7 defusedxml 0.6.0 distlib 0.3.1 distributed 2.19.0 entrypoints 0.3 fasteners 0.14.1 filelock 3.0.12 google-api-core 1.24.1 google-auth 1.24.0 googleapis-common-protos 1.52.0 gpustat 0.6.0 grpcio 1.34.0 HeapDict 1.0.1 hiredis 1.1.0 idna 2.10 importlib-metadata 2.1.1 iniconfig 1.1.1 ipykernel 5.4.2 ipython 7.19.0 ipython-genutils 0.2.0 ipywidgets 7.6.2 jedi 0.18.0 Jinja2 2.11.2 jinja2-time 0.2.0 jmespath 0.10.0 joblib 1.0.0 jsonschema 3.2.0 jupyter-client 6.1.7 jupyter-console 6.2.0 jupyter-core 4.7.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 MarkupSafe 1.1.1 mistune 0.8.4 modin 0.10.1 monotonic 1.5 msgpack 1.0.2 multidict 5.1.0 nbclient 0.5.1 nbconvert 6.0.7 nbformat 5.0.8 nest-asyncio 1.4.3 notebook 6.1.6 numcodecs 0.7.2 numpy 1.19.4 nvidia-ml-py3 7.352.0 opencensus 0.7.11 opencensus-context 0.1.2 packaging 20.8 pandas 1.3.0 pandocfilters 1.4.2 parso 0.8.1 pbr 5.5.1 pep517 0.9.1 pexpect 4.8.0 pickleshare 0.7.5 pip 20.3.3 pip-tools 5.5.0 pluggy 0.13.1 poyo 0.5.0 prometheus-client 0.9.0 prompt-toolkit 3.0.8 protobuf 3.17.3 psutil 5.8.0 ptyprocess 0.6.0 py 1.10.0 py-spy 0.3.3 pyarrow 1.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.20 pydantic 1.8.2 Pygments 2.7.3 pyOpenSSL 20.0.1 pyparsing 2.4.7 PyQt5 5.12.3 PyQt5-sip 4.19.18 PyQtChart 5.12 PyQtWebEngine 5.12.1 pyrsistent 0.17.3 PySocks 1.7.1 pytest 6.2.1 pytest-dependency 0.5.1 python-dateutil 2.8.1 python-slugify 4.0.1 pytz 2020.5 PyYAML 5.3.1 pyzmq 20.0.0 qtconsole 5.0.1 QtPy 1.9.0 ray 1.4.1 redis 3.5.3 requests 2.25.1 rsa 4.6 s3transfer 0.3.4 scikit-learn 0.24.0 scipy 1.6.0 Send2Trash 1.5.0 setuptools 49.6.0.post20201009 six 1.15.0 sniffio 1.2.0 sortedcontainers 2.3.0 tblib 1.7.0 terminado 0.9.1 testpath 0.4.4 text-unidecode 1.3 threadpoolctl 2.1.0 toml 0.10.2 toolz 0.11.1 tornado 6.1 tox 3.20.1 traitlets 5.0.5 typing-extensions 3.7.4.3 urllib3 1.26.2 virtualenv 20.2.2 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.36.2 widgetsnbextension 3.5.1 xgboost 1.3.1 yarl 1.6.3 zarr 2.6.1 zict 2.0.0 zipp 3.4.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
YarShevcommented, Jul 15, 2021

It seems that depends on OS. I have just ran reading the file on Windows and got the same error. We will take a look at it.

1reaction
devin-petersohncommented, Sep 20, 2021

This is slotted for the next release. Thanks everyone for adding context here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.read_feather — pandas 1.5.2 documentation
PathLike[str] ), or file-like object implementing a binary read() function. ... Please see fsspec and urllib for more details, and for more examples...
Read more >
pd.read_feather() cannot read feather file -- OSError
I have some py scripts to read data from db and write it to a feather file (with pandas to_feather() method). But somehow...
Read more >
Read_feather() function error
I tried both pd.read_feathers() and feather.read_dataframe(). In both the cases I am facing the same issue. 1 Like.
Read more >
Reading and writing in feather format - Pandas
Using feather enables faster I/O speeds and less memory. ... by calling the method read_feather() method of the pandas module and printed onto...
Read more >
Python pandas.read_feather() Examples
The following are 30 code examples of pandas.read_feather(). ... **kwargs): if expected is None: expected = df with ensure_clean() as path: to_feather(df, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found