question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark Standalone Cluster issue: [No FileSystem for scheme: http]

See original GitHub issue

I am trying to use Spark 3.0 with the local standalone cluster setup. I just simply create 1 master and 1 worker locally. However, the job is keep crashing with the issue

20/11/23 15:55:24 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to null
...
Caused by: java.io.IOException: No FileSystem for scheme: http
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)

It seemed all jars are uploaded to the Spark remote server and Spark trying to fetch them.

import $ivy.`org.apache.spark:spark-sql_2.12:3.0.0`
import org.apache.spark.sql._

val spark = {
  NotebookSparkSession.builder()
    .master("spark://localhost:7077")
    .getOrCreate()
}
spark.conf.getAll.foreach(pair => println(pair._1 + ":" + pair._2))
def sc = spark.sparkContext
val rdd = sc.parallelize(1 to 100000000, 100)
val n = rdd.map(_ + 1).sum()

You can reproduce by using the above code (and please create a standalone cluster beforehand

curl -O https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz
tar zxvf spark-3.0.0-bin-hadoop2.7.tgz
mv spark-3.0.0-bin-hadoop2.7/ spark
export SPARK_MASTER_HOST=localhost
export SPARK_WORKER_INSTANCES=1
./spark/sbin/start-master.sh
./spark/sbin/start-slave.sh spark://localhost:7077

#637 has the same issue it seemed

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
mallmancommented, Dec 10, 2020

@lanking520 I found a workaround. Get spark-stubs_30_2.12-0.10.1.jar from https://search.maven.org/artifact/sh.almond/spark-stubs_30_2.12/0.10.1/jar. Put it somewhere where the executors will be able to load it from their filesystem (i.e. in each worker node’s filesystem). In my case, I’m using NFS so I put it there and the executors can read the file from NFS. Set the spark.executor.extraClassPath spark configuration setting to the executor filesystem location of spark-stubs_30_2.12-0.10.1.jar. Follow the standard instructions for constructing a SparkSession from a NotebookSparkSession.builder() and try from there. I believe this should work. LMK

0reactions
lanking520commented, Dec 14, 2020

@mallman Yeah, got it. Probably I need to add this into my spark.conf file before I launch the worker node.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does Spark fail with "No File System for scheme: local"?
This is because Spark Standalone does not understand it and, unless I'm mistaken, the only cluster environments that support this local URI ...
Read more >
'No FileSystem for scheme: hdfs' exception when running ...
I have a 13 nodes cdh4.1.1 cluster, and I want to run spark on yarn. Everything is ok in the beginning except that...
Read more >
spark-shell error : No FileSystem for scheme: wasb
Hi,. We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation....
Read more >
Spark Standalone Mode - Spark 3.3.1 Documentation
To launch a Spark standalone cluster with the launch scripts, you should create a file called conf/workers in your Spark directory, which must...
Read more >
Using Input and Output (I/O) · Spark
scala> sc.textFile("http://japila.pl").foreach(println) java.io.IOException: No FileSystem for scheme: http at org.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found