question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

custom project context class

See original GitHub issue

Description

I would like to use centralized Spark configuration file under conf/base/spark.yml and would like to pass these to SparkSession.

Context

I followed the steps mentioned here in the documentation. https://kedro.readthedocs.io/en/stable/11_tools_integration/01_pyspark.html?highlight=SparkSession#initialise-a-sparksession-in-custom-project-context-class

First Issue: KedroContext Which is not imported in the given example.

Second Issue: If I import like this from kedro.framework.context import KedroContext I am getting the below error: TypeError: init() got an unexpected keyword argument ‘package_name’

File "/root/user_files/envs/xpvu4/lib/python3.7/site-packages/kedro/framework/session/session.py", line 226, in create
    session._setup_logging()
  File "/root/user_files/envs/xpvu4/lib/python3.7/site-packages/kedro/framework/session/session.py", line 244, in _setup_logging
    conf_logging = self._get_logging_config()
  File "/root/user_files/envs/xpvu4/lib/python3.7/site-packages/kedro/framework/session/session.py", line 230, in _get_logging_config
    context = self.load_context()
  File "/root/user_files/envs/xpvu4/lib/python3.7/site-packages/kedro/framework/session/session.py", line 306, in load_context
    extra_params=extra_params,
TypeError: __init__() got an unexpected keyword argument 'package_name'

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.17.0
  • Python version used (python -V): Python 3.7.9
  • Operating system and version:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
samhiscoxqbcommented, Feb 9, 2021

Hello! The example CustomContext needs a little tweak to include project_name as an __init__ argument:

from typing import Any, Dict, Union
from pathlib import Path

from pyspark import SparkConf
from pyspark.sql import SparkSession

from kedro.framework.context import KedroContext


class CustomContext(KedroContext):

    def __init__(
        self,
        package_name: str,                                                 # <= Add package_name here
        project_path: Union[Path, str],
        env: str = None,
        extra_params: Dict[str, Any] = None,
    ):
        super().__init__(package_name, project_path, env, extra_params).   # <= and here!
        self.init_spark_session()

    def init_spark_session(self) -> None:
        """Initialises a SparkSession using the config defined in project's conf folder."""

        # Load the spark configuration in spark.yaml using the config loader
        parameters = self.config_loader.get("spark*", "spark*/**")
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder
            .appName(self.package_name)
            .enableHiveSupport()
            .config(conf=spark_conf)
        )
        _spark_session = spark_session_conf.getOrCreate()
        _spark_session.sparkContext.setLogLevel("WARN")

I’ll open a PR to update this - thanks for raising both 👍

1reaction
remiromicommented, Feb 10, 2021

@samhiscoxqb thanks a lot for your help, it worked! I had also to edit another line that you wrote correctly but in the doc is not (-> https://kedro.readthedocs.io/en/stable/11_tools_integration/01_pyspark.html ) In the doc the code is

# Initialise the spark session
        spark_session_conf = (
            SparkSession.builder
            .appName(self.project_name)      # Error Here
            .enableHiveSupport()
            .config(conf=spark_conf)
        )

While I had to edit the appName line as

# Initialise the spark session
        spark_session_conf = (
            SparkSession.builder
            .appName(self.package_name)        # Corrected line
            .enableHiveSupport()
            .config(conf=spark_conf)
        )

since it was giving me AttributeError on the self.project_name call.

Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

custom project context class · Issue #676 · kedro-org ...
Description I would like to use centralized Spark configuration file under conf/base/spark.yml and would like to pass these to SparkSession.
Read more >
Project.CustomFields property
Gets the collection of project custom fields that have values set for the project. Namespace: Microsoft.ProjectServer.Client
Read more >
Ability to define multiple global contexts for custom fields - Jira
If you made Project Categories a 1st class, usable object in JIRA, at the very least you could allow us to define a...
Read more >
Working with enterprise custom fields in Project Online
context.ExecuteQuery();. Putting all of this together, I made a class ...
Read more >
client object model - Get project custom fields (Project Server)
I am trying to access, in a project detail page, the project context via Javascript to retrieve the data of the custom fields....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found