Spark Standalone Cluster issue: [No FileSystem for scheme: http]
See original GitHub issueI am trying to use Spark 3.0 with the local standalone cluster setup. I just simply create 1 master and 1 worker locally. However, the job is keep crashing with the issue
20/11/23 15:55:24 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to null
...
Caused by: java.io.IOException: No FileSystem for scheme: http
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
It seemed all jars are uploaded to the Spark remote server and Spark trying to fetch them.
import $ivy.`org.apache.spark:spark-sql_2.12:3.0.0`
import org.apache.spark.sql._
val spark = {
NotebookSparkSession.builder()
.master("spark://localhost:7077")
.getOrCreate()
}
spark.conf.getAll.foreach(pair => println(pair._1 + ":" + pair._2))
def sc = spark.sparkContext
val rdd = sc.parallelize(1 to 100000000, 100)
val n = rdd.map(_ + 1).sum()
You can reproduce by using the above code (and please create a standalone cluster beforehand
curl -O https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz
tar zxvf spark-3.0.0-bin-hadoop2.7.tgz
mv spark-3.0.0-bin-hadoop2.7/ spark
export SPARK_MASTER_HOST=localhost
export SPARK_WORKER_INSTANCES=1
./spark/sbin/start-master.sh
./spark/sbin/start-slave.sh spark://localhost:7077
#637 has the same issue it seemed
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
Why does Spark fail with "No File System for scheme: local"?
This is because Spark Standalone does not understand it and, unless I'm mistaken, the only cluster environments that support this local URI ...
Read more >'No FileSystem for scheme: hdfs' exception when running ...
I have a 13 nodes cdh4.1.1 cluster, and I want to run spark on yarn. Everything is ok in the beginning except that...
Read more >spark-shell error : No FileSystem for scheme: wasb
Hi,. We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation....
Read more >Spark Standalone Mode - Spark 3.3.1 Documentation
To launch a Spark standalone cluster with the launch scripts, you should create a file called conf/workers in your Spark directory, which must...
Read more >Using Input and Output (I/O) · Spark
scala> sc.textFile("http://japila.pl").foreach(println) java.io.IOException: No FileSystem for scheme: http at org.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@lanking520 I found a workaround. Get
spark-stubs_30_2.12-0.10.1.jar
from https://search.maven.org/artifact/sh.almond/spark-stubs_30_2.12/0.10.1/jar. Put it somewhere where the executors will be able to load it from their filesystem (i.e. in each worker node’s filesystem). In my case, I’m using NFS so I put it there and the executors can read the file from NFS. Set thespark.executor.extraClassPath
spark configuration setting to the executor filesystem location ofspark-stubs_30_2.12-0.10.1.jar
. Follow the standard instructions for constructing aSparkSession
from aNotebookSparkSession.builder()
and try from there. I believe this should work. LMK@mallman Yeah, got it. Probably I need to add this into my
spark.conf
file before I launch the worker node.