Spark on kubernetes
See original GitHub issueI am testing Spark on kubernetes, launched through almond jupyter kernel.
Conclusion I have for now is that pure dataframe operation work out of the box (assuming http file system is installed), however any lambda functions seems to fail.
I noticed that Spark shell is using spark://
protocol for REPL class and almond is using http protocol:
2019-03-27 19:10:46 INFO Executor:54 - Using REPL class URI: http://xxx.xxx.xxx.xxx:xxxx
that’s why you need hadoop http filesystem (added in 2.9.x)
However there is a problem with that may prevent almond from fully operating:
2019-03-27 19:10:47 ERROR ExecutorClassLoader:91 - Failed to check existence of class ammonite.$sess.cmd6$Helper$$anonfun$2 on REPL class server at http://xxx.xxx.xxx.xxx:xxxx
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.<init>(Path.java:175)
at org.apache.hadoop.fs.Path.<init>(Path.java:110)
at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromFileSystem(ExecutorClassLoader.scala:115)
and then any lambda function inside spark map, etc fail with following exception:
2019-03-27 19:10:47 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
What is Spark on Kubernetes? | Data Mechanics
A Kubernetes cluster consists of a set of nodes on which you can run containerized Apache Spark applications (as well any other containerized...
Read more >Running Apache Spark on Kubernetes: Best Practices and ...
The goal, with a Spark Kubernetes is to persist shuffle files to a remote storage outside of the cluster, probably an object store...
Read more >Running Apache Spark on Kubernetes - Medium
Spark Submit is sent from a client to the Kubernetes API server in the master node. · Kubernetes will schedule a new Spark...
Read more >Spark on Kubernetes: The Hows and the Whys - CloudBees
Spark is a powerful data analytics platform to build and serve machine learning applications with. Kubernetes brings automation to a ...
Read more >How (And Why) To Move From Spark on YARN to Kubernetes
Spark on Kubernetes, on the other hand, allows for different versions of Spark and Python to run in the same cluster and allows...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I did more investigating on this. Was able to provide firewall access for the random port using a network policy (I run spark on kubernetes) and was able to connect. But instead of not being able to check the existence of the class it gets an empty string. So same error, slightly different
org.apache.spark.repl.RemoteClassLoaderError: ammonite.$sess.cmd7$Helper
But if I use a local master with .master(“local[*]”) it works fine. For some reason the REPL class server returns empty strings when using a remote master.
Also tried loading different versions of ammonite and spark, and different version of scala. Once all the versions are lined up I get the same error every time.
Another update: It doesn’t seem to be an almond issue. I started and ammnite shell and get the same results. Something interesting though, maybe useful
I wonder if it’s normal that the resource ids are different