Managing the Spark Session
See original GitHub issueIn the spark-fast-tests README, we encourage users to wrap the Spark Session in a trait and mix in the trait to test classes that need access to the Spark Session.
import org.apache.spark.sql.SparkSession
trait SparkSessionTestWrapper {
lazy val spark: SparkSession = {
SparkSession.builder().master("local").appName("spark session").getOrCreate()
}
}
The spark-testing-base library uses the EvilSessionTools approach to extract the SQL context.
I don’t think the testing framework should have any knowledge or control over the Spark Session. The Spark Session management should take place in the application and the test framework should simply provide tools that help with assertions.
@snithish @eclosson - I would like your feedback on this intentional design decision. Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Managing the SparkSession, The DataFrame Entry Point
A SparkSession is automatically created and stored in the spark variable whenever you start the Spark console or open a Databricks notebook.
Read more >Spark Session — PySpark 3.3.1 documentation
The entry point to programming Spark with the Dataset and DataFrame API. To create a Spark session, you should use SparkSession.builder attribute.
Read more >How to use SparkSession in Apache Spark 2.0 - Databricks
It allows you to configure Spark configuration parameters. And through SparkContext, the driver can access other contexts such as SQLContext, ...
Read more >How to manage spark session creation on real projects
Normally I just create a Spark session via SparkSession.getOrCreate() in my main method and inject it everywhere I need.
Read more >A tale of Spark Session and Spark Context | by achilleus
Spark session is a unified entry point of a spark application from Spark 2.0. It provides a way to interact with various spark's...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@eclosson @snithish - I created an issue in the spark-testing-base repo: https://github.com/holdenk/spark-testing-base/issues/186 Feel free to chime in on the issue if you’d like to add anything.
@eclosson @snithish - Thanks for the feedback. I’m going to open an issue in the spark-testing-base library and see if they would consider removing the SparkSession management from DataFrameSuiteBase. I’d like the good ideas we develop in this testing framework to get reflected in the spark-testing-base repo as well since it’s more popular.