question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[KED-2643] kedro install fails on pyspark starter

See original GitHub issue

Description

After creation of a new kedro project on a brand new conda environment using pyspark starter, kedro install fails. It seems that kedro tries to import module with project context (where import from pyspark is done) and fails, since spark is not yet installed. Also, other cli commands (e.g. kedro --version) fail with the same error (while executed inside project’s directory).

Steps to Reproduce

  • Create a new environment
  • pip install kedro
  • kedro new --starter=pyspark
  • cd to project’s directory
  • kedro install

Expected Result

kedro installs packages specified in requirements.txt Not sure why cli goes into project settings. I guess there are several cli commands that do need to care about project specifics anyway.

Actual Result

Error with the following stacktrace:

Traceback (most recent call last): File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/bin/kedro”, line 8, in <module> sys.exit(main()) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py”, line 268, in main cli_collection = KedroCLI(project_path=Path.cwd()) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py”, line 181, in init self._metadata = bootstrap_project(project_path) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/startup.py”, line 181, in bootstrap_project configure_project(metadata.package_name) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py”, line 218, in configure_project _validate_module(settings_module) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py”, line 210, in _validate_module importlib.import_module(settings_module) File “/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “<frozen importlib._bootstrap>”, line 1006, in _gcd_import File “<frozen importlib._bootstrap>”, line 983, in _find_and_load File “<frozen importlib._bootstrap>”, line 967, in _find_and_load_unlocked File “<frozen importlib._bootstrap>”, line 677, in _load_unlocked File “<frozen importlib._bootstrap_external>”, line 728, in exec_module File “<frozen importlib._bootstrap>”, line 219, in _call_with_frames_removed File “/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/settings.py”, line 30, in <module> from pyspark_test.context import ProjectContext File “/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/context.py”, line 34, in <module> from pyspark import SparkConf ModuleNotFoundError: No module named ‘pyspark’

Your Environment

MacOS Catalina (originally got it on Windows 10). PyCharm CE 2020.3.2 Conda environment (Python 3.7.10) Kedro 0.17.3

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
limdautocommented, Jul 29, 2021
1reaction
merelchtcommented, May 19, 2021

Hi @glebrh, thanks for flagging this issue! This indeed isn’t working properly. I’ve created a ticket on our backlog to address it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Build a Kedro pipeline with PySpark - Read the Docs
This page outlines some best practices when building a Kedro pipeline with PySpark . It assumes a basic understanding of both Kedro and...
Read more >
kedro-starter-pyspark-iris/README.md at master - GitHub
The kedro-starter-pyspark-iris Kedro starter. Introduction. The code in this repository demonstrates best practice when working with Kedro and PySpark.
Read more >
Kedro 0.16.3 and kedro[spark.SparkDataSet] pip libraries ...
To install kedro follow this installation prerequisites · Install Kedro. To install Kedro from the Python Package Index (PyPI) simply run:
Read more >
How to Setup PySpark for your Kedro Pipeline - YouTube
PySpark is a favorite of the Data Science and Data Engineering community. In this video, we walk through the steps necessary to setup...
Read more >
Getting Started with Kedro - Medium
Kedro is a data and analytics workflow framework that implements best practices ... The first step is to pip install kedro==0.15.0 into a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found