question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[KED-1056] pyarrow imported by package created by `kedro new` is problematic

See original GitHub issue

Description

Package created by kedro new command seems to import pyarrow and cause a problem.

Context

I tried to use a Python package called “ray” inside the package created by kedro new command, but import ray returns the following error.

    import ray
  File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
    raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).

This issue was reported at ray’s GitHub repo (https://github.com/ray-project/ray/issues/5497), but hasn’t been resolved yet.

This error itself is ray’s issue, but I found that it does not reproduce outside of a package created by kedro new.

Steps to Reproduce

Here are 3 experiments:

[Case 1] Code in a module in the package creaed by kedro new (package_created_by_kedro_new.nodes.exmaple.py)

import ray 

ImportError: Ray must be imported before pyarrow ... as shown above.

[Case 2] Code in a module outside of the package created by kedro new

import kedro
print(">>> imported kedro: ", kedro.__version__)
import pyarrow
print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)

Output:

>>> imported kedro:  0.15.1
>>> pyarrow NOT imported by ray:  0.12.0
Traceback (most recent call last):
  File "/home/u/Minyus/_Python_scratch/scratch.py", line 5, in <module>
    import ray
  File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
    raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).

Process finished with exit code 1

[Case 3] Code in a module outside of the package created by kedro new

import kedro
print(">>> imported kedro: ", kedro.__version__)
# import pyarrow
# print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)
import pyarrow
print(">>> pyarrow imported by ray: ", pyarrow.__version__)
print(">>> No error is returned!")

Output:

>>> imported kedro:  0.15.1
>>> imported ray:  0.7.3
>>> pyarrow imported by ray:  0.14.0.RAY
>>> No error is returned!

Process finished with exit code 0

In summary, importing kedro is harmless, but the package created by kedro new command seems to import pyarrow and causes a problem. Are there any workaround?

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.15.1 (installed by pip install kedro)
  • Python version used (python -V): Python 3.6.8
  • Operating system and version: Ubuntu 18.04.3 LTS

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Minyuscommented, Sep 19, 2019

Thanks @Minyus, could it possibly be a duplicate of this issue?

Hi @Pet3ris, I believe my issue is technically different although both are related to pyarrow. While https://github.com/quantumblacklabs/kedro/issues/94 is installation failure using pipenv, I do not use pipenv and I can install and import Kedro without errors. I can use kedro new command as well.

1reaction
ghostcommented, Feb 4, 2020

From what I can understand, it’s not kedro new that is the problem, but running any code that includes import ray with kedro run, since kedro/cli.py imports stuff that imports stuff, that eventually imports kedro.io which imports datasets which imports s3fs which imports fsspec which imports pyarrow, sopyarrow will be present in sys.modules long before kedro run runs import ray.

We’re restructuring our contrib and dependencies significantly, which would have potentially solved your problem, but we’re also moving all our datasets to use fsspec (which imports pyarrow) – so if you want to use any fsspec dataset, you’ll be forced to have pyarrow.

I think this is really something on ray’s side to fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[KED-1056] pyarrow imported by package created by kedro ...
Package created by kedro new command seems to import pyarrow and cause a problem. Context. I tried to use a Python package called...
Read more >
ModuleNotFoundError when importing pyarrow - Stack Overflow
This fixed that error for me. Note that I'm building from source though. $ export PYARROW_WITH_DATASET=1. Before: >>> import pyarrow ...
Read more >
Issue might happen import PyArrow
ImportError: /home/dv6/anaconda3/envs/spark/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow5F.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found