Classpath conflicts between multiple instances of ScalaInterpreters
See original GitHub issueI have multiple instances of ScalaInterpreters in my custom kernel. I’m using SparkSession inside and for each interpeter i use the following startup code:
import org.apache.spark.sql._
val sparkSession = {NotebookSparkSession.builder()
.master("loca[1]").getOrCreate()}
I start my kernel with option specificLoader=false
to share same instance of sparkContext
and everything works fine, but from time to time (same code might work or throw an exception) I see serialization errors from spark and i believe it is related to classpath loader.
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
If I start my kernel with specificLoader=true
I don’t see this error at all, but every interpreter creates a new instance of sparkContext
, which I want to avoid.
My test code is the following:
val sc = sparkSession.sparkContext
val rdd = sc.parallelize(1 to 100, 10)
val n = rdd.map(_ + 1).sum()
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
manage conflict on java classpath - Stack Overflow
If you launch two separate instances of the JVM for the two programs, then don't use the same classpath! Isn't that obvious?
Read more >Understanding Gradle #10 – Dependency Version Conflicts
... 0:26 Example : Producing a conflict ▶️ 2 :25 Effects of Gradle's default resolution behavior ▶️ 4:20 Different classpaths can have ...
Read more >CLASSPATH in Java - GeeksforGeeks
Packages are used for: Preventing naming conflicts. For example, there can be two classes with the name Employee in two packages, college.staff.
Read more >Adding Classes to the JAR File's Classpath
An Example. We want to load classes in MyUtils.jar into the class path for use in MyJar.jar. These two JAR files are in...
Read more >JAR Hell, Part 1 (Compilation, Classpaths, and Libraries)
class files on the Classpath to be organized in a directory structure that matches their package hierarchy. So, a more realistic example of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I don’t have a definite answer, but at the very end, the package is set in generated code around here. By navigating your way in the Ammonite code from there, it may be possible to pass a custom package prefix to
ammonite.interp.Interpreter
so that it ends up being used there.Alternatively, the
cmd
prefix in the class name is defined here. Just allowing to customize it via a field passed toammonite.interp.Interpreter
(and set it yourself to"cmd1_"
,"cmd2_"
, etc. so that the classes are named"cmd1_1"
,"cmd1_2"
, …) may work as well, and could be more straightforward.The Ammonite README has some doc detailing manual commands to quickly test that kind of changes.
I’m not sure I understand what you’re trying to achieve: you have a custom kernel, using
ScalaInterpreter
, and you’re trying to have it act like multiple kernels at once, all sharing the sameSparkSession
, right?If that’s the case, one issue I see is that each instance of
ScalaInterpreter
will setspark.repl.class.uri
in the spark conf. This setting correspond to the URI of a small web server that ammonite-spark launches, to serve the byte code of the classes generated during the session. I’m not sure which one spark will retain at the end, but only one of them will likely be accessible. Parts of the logic of ammonite-spark should be customized, so that only one such server is spawned for all the sessions.Another problem you’ll likely run into (I see you alluded to it on the Ammonite gitter) is that the
ScalaInterpreter
s will generate classes with similar names (likecmd1
,cmd2
, etc.), which is going to be a problem from the spark executors, which won’t be able to distinguish between the classes of the various interpreters. That may require some customizations in Ammonite… (to put all classes of a session in a custom package for example).