[KED-1056] pyarrow imported by package created by `kedro new` is problematic
See original GitHub issueDescription
Package created by kedro new
command seems to import pyarrow and cause a problem.
Context
I tried to use a Python package called “ray” inside the package created by kedro new
command, but import ray
returns the following error.
import ray
File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).
This issue was reported at ray’s GitHub repo (https://github.com/ray-project/ray/issues/5497), but hasn’t been resolved yet.
This error itself is ray’s issue, but I found that it does not reproduce outside of a package
created by kedro new
.
Steps to Reproduce
Here are 3 experiments:
[Case 1] Code in a module in the package creaed by kedro new
(package_created_by_kedro_new.nodes.exmaple.py)
import ray
ImportError: Ray must be imported before pyarrow ...
as shown above.
[Case 2] Code in a module outside of the package created by kedro new
import kedro
print(">>> imported kedro: ", kedro.__version__)
import pyarrow
print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)
Output:
>>> imported kedro: 0.15.1
>>> pyarrow NOT imported by ray: 0.12.0
Traceback (most recent call last):
File "/home/u/Minyus/_Python_scratch/scratch.py", line 5, in <module>
import ray
File "/usr/local/lib/python3.6/dist-packages/ray/__init__.py", line 9, in <module>
raise ImportError("Ray must be imported before pyarrow because Ray "
ImportError: Ray must be imported before pyarrow because Ray requires a specific version of pyarrow (which is packaged along with Ray).
Process finished with exit code 1
[Case 3] Code in a module outside of the package created by kedro new
import kedro
print(">>> imported kedro: ", kedro.__version__)
# import pyarrow
# print(">>> pyarrow NOT imported by ray: ", pyarrow.__version__)
import ray
print(">>> imported ray: ", ray.__version__)
import pyarrow
print(">>> pyarrow imported by ray: ", pyarrow.__version__)
print(">>> No error is returned!")
Output:
>>> imported kedro: 0.15.1
>>> imported ray: 0.7.3
>>> pyarrow imported by ray: 0.14.0.RAY
>>> No error is returned!
Process finished with exit code 0
In summary, importing kedro is harmless, but the package created by kedro new
command seems
to import pyarrow and causes a problem.
Are there any workaround?
Your Environment
- Kedro version used (
pip show kedro
orkedro -V
): 0.15.1 (installed bypip install kedro
) - Python version used (
python -V
): Python 3.6.8 - Operating system and version: Ubuntu 18.04.3 LTS
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
[KED-1056] pyarrow imported by package created by kedro ...
Package created by kedro new command seems to import pyarrow and cause a problem. Context. I tried to use a Python package called...
Read more >ModuleNotFoundError when importing pyarrow - Stack Overflow
This fixed that error for me. Note that I'm building from source though. $ export PYARROW_WITH_DATASET=1. Before: >>> import pyarrow ...
Read more >Issue might happen import PyArrow
ImportError: /home/dv6/anaconda3/envs/spark/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow5F.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Pet3ris, I believe my issue is technically different although both are related to pyarrow. While https://github.com/quantumblacklabs/kedro/issues/94 is installation failure using pipenv, I do not use pipenv and I can install and import Kedro without errors. I can use
kedro new
command as well.From what I can understand, it’s not
kedro new
that is the problem, but running any code that includesimport ray
withkedro run
, sincekedro/cli.py
imports stuff that imports stuff, that eventually importskedro.io
which imports datasets which importss3fs
which importsfsspec
which importspyarrow
, sopyarrow
will be present insys.modules
long beforekedro run
runsimport ray
.We’re restructuring our
contrib
and dependencies significantly, which would have potentially solved your problem, but we’re also moving all our datasets to usefsspec
(which importspyarrow
) – so if you want to use anyfsspec
dataset, you’ll be forced to havepyarrow
.I think this is really something on
ray
’s side to fix.