Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Classpath conflicts between multiple instances of ScalaInterpreters

See original GitHub issue

I have multiple instances of ScalaInterpreters in my custom kernel. I’m using SparkSession inside and for each interpeter i use the following startup code:

import org.apache.spark.sql._
val sparkSession = {NotebookSparkSession.builder()
.master("loca[1]").getOrCreate()}

I start my kernel with option specificLoader=false to share same instance of sparkContext and everything works fine, but from time to time (same code might work or throw an exception) I see serialization errors from spark and i believe it is related to classpath loader.

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

If I start my kernel with specificLoader=true I don’t see this error at all, but every interpreter creates a new instance of sparkContext, which I want to avoid.

My test code is the following:

val sc = sparkSession.sparkContext
val rdd = sc.parallelize(1 to 100, 10)
val n = rdd.map(_ + 1).sum()

Issue Analytics

State:
Created 4 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

alexarchambaultcommented, Oct 22, 2019

How do you make a custom package in Ammonite ?

I don’t have a definite answer, but at the very end, the package is set in generated code around here. By navigating your way in the Ammonite code from there, it may be possible to pass a custom package prefix to ammonite.interp.Interpreter so that it ends up being used there.

Alternatively, the cmd prefix in the class name is defined here. Just allowing to customize it via a field passed to ammonite.interp.Interpreter (and set it yourself to "cmd1_", "cmd2_", etc. so that the classes are named "cmd1_1", "cmd1_2", …) may work as well, and could be more straightforward.

The Ammonite README has some doc detailing manual commands to quickly test that kind of changes.

1reaction

alexarchambaultcommented, Oct 21, 2019

I’m not sure I understand what you’re trying to achieve: you have a custom kernel, using ScalaInterpreter, and you’re trying to have it act like multiple kernels at once, all sharing the same SparkSession, right?

If that’s the case, one issue I see is that each instance of ScalaInterpreter will set spark.repl.class.uri in the spark conf. This setting correspond to the URI of a small web server that ammonite-spark launches, to serve the byte code of the classes generated during the session. I’m not sure which one spark will retain at the end, but only one of them will likely be accessible. Parts of the logic of ammonite-spark should be customized, so that only one such server is spawned for all the sessions.

Another problem you’ll likely run into (I see you alluded to it on the Ammonite gitter) is that the ScalaInterpreters will generate classes with similar names (like cmd1, cmd2, etc.), which is going to be a problem from the spark executors, which won’t be able to distinguish between the classes of the various interpreters. That may require some customizations in Ammonite… (to put all classes of a session in a custom package for example).