[KED-2643] kedro install fails on pyspark starter
See original GitHub issueDescription
After creation of a new kedro project on a brand new conda environment using pyspark starter, kedro install fails.
It seems that kedro tries to import module with project context (where import from pyspark is done) and fails, since spark is not yet installed.
Also, other cli commands (e.g. kedro --version) fail with the same error (while executed inside project’s directory).
Steps to Reproduce
- Create a new environment
pip install kedrokedro new --starter=pyspark- cd to project’s directory
kedro install
Expected Result
kedro installs packages specified in requirements.txt Not sure why cli goes into project settings. I guess there are several cli commands that do need to care about project specifics anyway.
Actual Result
Error with the following stacktrace:
Traceback (most recent call last): File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/bin/kedro”, line 8, in <module> sys.exit(main()) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py”, line 268, in main cli_collection = KedroCLI(project_path=Path.cwd()) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py”, line 181, in init self._metadata = bootstrap_project(project_path) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/startup.py”, line 181, in bootstrap_project configure_project(metadata.package_name) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py”, line 218, in configure_project _validate_module(settings_module) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py”, line 210, in _validate_module importlib.import_module(settings_module) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “<frozen importlib._bootstrap>”, line 1006, in _gcd_import File “<frozen importlib._bootstrap>”, line 983, in _find_and_load File “<frozen importlib._bootstrap>”, line 967, in _find_and_load_unlocked File “<frozen importlib._bootstrap>”, line 677, in _load_unlocked File “<frozen importlib._bootstrap_external>”, line 728, in exec_module File “<frozen importlib._bootstrap>”, line 219, in _call_with_frames_removed File “/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/settings.py”, line 30, in <module> from pyspark_test.context import ProjectContext File “/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/context.py”, line 34, in <module> from pyspark import SparkConf ModuleNotFoundError: No module named ‘pyspark’
Your Environment
MacOS Catalina (originally got it on Windows 10). PyCharm CE 2020.3.2 Conda environment (Python 3.7.10) Kedro 0.17.3
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)

Top Related StackOverflow Question
@ignacioparicio can we close this as well? https://github.com/quantumblacklabs/kedro-starters/issues/38
Hi @glebrh, thanks for flagging this issue! This indeed isn’t working properly. I’ve created a ticket on our backlog to address it.