question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Managing the Spark Session

See original GitHub issue

In the spark-fast-tests README, we encourage users to wrap the Spark Session in a trait and mix in the trait to test classes that need access to the Spark Session.

import org.apache.spark.sql.SparkSession

trait SparkSessionTestWrapper {

  lazy val spark: SparkSession = {
    SparkSession.builder().master("local").appName("spark session").getOrCreate()
  }

}

The spark-testing-base library uses the EvilSessionTools approach to extract the SQL context.

I don’t think the testing framework should have any knowledge or control over the Spark Session. The Spark Session management should take place in the application and the test framework should simply provide tools that help with assertions.

@snithish @eclosson - I would like your feedback on this intentional design decision. Thanks!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
MrPowerscommented, May 19, 2017

@eclosson @snithish - I created an issue in the spark-testing-base repo: https://github.com/holdenk/spark-testing-base/issues/186 Feel free to chime in on the issue if you’d like to add anything.

1reaction
MrPowerscommented, May 18, 2017

@eclosson @snithish - Thanks for the feedback. I’m going to open an issue in the spark-testing-base library and see if they would consider removing the SparkSession management from DataFrameSuiteBase. I’d like the good ideas we develop in this testing framework to get reflected in the spark-testing-base repo as well since it’s more popular.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Managing the SparkSession, The DataFrame Entry Point
A SparkSession is automatically created and stored in the spark variable whenever you start the Spark console or open a Databricks notebook.
Read more >
Spark Session — PySpark 3.3.1 documentation
The entry point to programming Spark with the Dataset and DataFrame API. To create a Spark session, you should use SparkSession.builder attribute.
Read more >
How to use SparkSession in Apache Spark 2.0 - Databricks
It allows you to configure Spark configuration parameters. And through SparkContext, the driver can access other contexts such as SQLContext, ...
Read more >
How to manage spark session creation on real projects
Normally I just create a Spark session via SparkSession.getOrCreate() in my main method and inject it everywhere I need.
Read more >
A tale of Spark Session and Spark Context | by achilleus
Spark session is a unified entry point of a spark application from Spark 2.0. It provides a way to interact with various spark's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found