Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark on kubernetes

See original GitHub issue

I am testing Spark on kubernetes, launched through almond jupyter kernel. Conclusion I have for now is that pure dataframe operation work out of the box (assuming http file system is installed), however any lambda functions seems to fail. I noticed that Spark shell is using spark:// protocol for REPL class and almond is using http protocol:

2019-03-27 19:10:46 INFO Executor:54 - Using REPL class URI: http://xxx.xxx.xxx.xxx:xxxx

that’s why you need hadoop http filesystem (added in 2.9.x)

However there is a problem with that may prevent almond from fully operating:

2019-03-27 19:10:47 ERROR ExecutorClassLoader:91 - Failed to check existence of class ammonite.$sess.cmd6$Helper$$anonfun$2 on REPL class server at http://xxx.xxx.xxx.xxx:xxxx
java.lang.IllegalArgumentException: Can not create a Path from an empty string
	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
	at org.apache.hadoop.fs.Path.<init>(Path.java:175)
	at org.apache.hadoop.fs.Path.<init>(Path.java:110)
	at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromFileSystem(ExecutorClassLoader.scala:115)

and then any lambda function inside spark map, etc fail with following exception:

2019-03-27 19:10:47 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

1reaction

bitWeaver-archcommented, Aug 2, 2021

I did more investigating on this. Was able to provide firewall access for the random port using a network policy (I run spark on kubernetes) and was able to connect. But instead of not being able to check the existence of the class it gets an empty string. So same error, slightly different

org.apache.spark.repl.RemoteClassLoaderError: ammonite.$sess.cmd7$Helper

But if I use a local master with .master(“local[*]”) it works fine. For some reason the REPL class server returns empty strings when using a remote master.

Also tried loading different versions of ammonite and spark, and different version of scala. Once all the versions are lined up I get the same error every time.

0reactions

bitWeaver-archcommented, Aug 4, 2021

Another update: It doesn’t seem to be an almond issue. I started and ammnite shell and get the same results. Something interesting though, maybe useful

ERROR TaskResultGetter: Could not deserialize TaskEndReason: ClassNotFound with classloader ammonite.runtime.SpecialClassLoader@1ee29c84
@ ammonite.runtime.SpecialClassLoader 
res18: runtime.SpecialClassLoader.type = ammonite.runtime.SpecialClassLoader$@e156110

I wonder if it’s normal that the resource ids are different