Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark Standalone Cluster issue: [No FileSystem for scheme: http]

See original GitHub issue

I am trying to use Spark 3.0 with the local standalone cluster setup. I just simply create 1 master and 1 worker locally. However, the job is keep crashing with the issue

20/11/23 15:55:24 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to null
...
Caused by: java.io.IOException: No FileSystem for scheme: http
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)

It seemed all jars are uploaded to the Spark remote server and Spark trying to fetch them.

import $ivy.`org.apache.spark:spark-sql_2.12:3.0.0`
import org.apache.spark.sql._

val spark = {
  NotebookSparkSession.builder()
    .master("spark://localhost:7077")
    .getOrCreate()
}
spark.conf.getAll.foreach(pair => println(pair._1 + ":" + pair._2))
def sc = spark.sparkContext
val rdd = sc.parallelize(1 to 100000000, 100)
val n = rdd.map(_ + 1).sum()

You can reproduce by using the above code (and please create a standalone cluster beforehand

curl -O https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz
tar zxvf spark-3.0.0-bin-hadoop2.7.tgz
mv spark-3.0.0-bin-hadoop2.7/ spark
export SPARK_MASTER_HOST=localhost
export SPARK_WORKER_INSTANCES=1
./spark/sbin/start-master.sh
./spark/sbin/start-slave.sh spark://localhost:7077

#637 has the same issue it seemed

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

1reaction

mallmancommented, Dec 10, 2020

@lanking520 I found a workaround. Get spark-stubs_30_2.12-0.10.1.jar from https://search.maven.org/artifact/sh.almond/spark-stubs_30_2.12/0.10.1/jar. Put it somewhere where the executors will be able to load it from their filesystem (i.e. in each worker node’s filesystem). In my case, I’m using NFS so I put it there and the executors can read the file from NFS. Set the spark.executor.extraClassPath spark configuration setting to the executor filesystem location of spark-stubs_30_2.12-0.10.1.jar. Follow the standard instructions for constructing a SparkSession from a NotebookSparkSession.builder() and try from there. I believe this should work. LMK

0reactions

lanking520commented, Dec 14, 2020

@mallman Yeah, got it. Probably I need to add this into my spark.conf file before I launch the worker node.