Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarification about installation with spark

See original GitHub issue

Hi,

I am trying to install the GCS connector. The documentation state:

Note that you do not need to configure Hadoop in order to use the GCS connector with Spark.

But I am getting the following exception:

java.io.IOException: No FileSystem for scheme: gs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkA

Issue Analytics

State:
Created 4 years ago
Comments:17 (7 by maintainers)

Top GitHub Comments

3reactions

medbcommented, Sep 23, 2019

Yes, this is a shaded jar.

Shaded jar is a library jar that includes all its dependencies, it has -shaded suffix in the name in the Maven repo and when you are building project: https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.0.0/

3reactions

hadrienkcommented, Jun 20, 2019

Ok, I got it working:

"spark.hadoop.fs.gs.impl": {
    "value": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
    "type": "string"
},
"spark.hadoop.google.cloud.auth.service.account.enable": {
    "value": true,
    "type": "checkbox"
},
"spark.hadoop.google.cloud.auth.service.account.json.keyfile": {
    "value": "/zeppelin/key.json",
    "type": "string"
}

The PR is using a newer version maybe? https://github.com/GoogleCloudPlatform/bigdata-interop/pull/180/files because com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem was not found in the jar from the documentation.

Top Results From Across the Web

Clarification about installation with spark · Issue #188 - GitHub

Hi, I am trying to install the GCS connector. The documentation state: Note that you do not need to configure Hadoop in order...

Step-by-Step Apache Spark Installation Tutorial - ProjectPro

This tutorial is a step-by-step guide to install Apache Spark. Installation of JAVA 8 for JVM and has examples of Extract, Transform and...

Installing Spark | Spark Deployment Modes - InformIT

Big data consultant Jeffrey Aven covers the basics about how Spark is deployed and how to install Spark.

Spark prerequisites - Cloudera Documentation

Spark Thrift server requires Hive deployed on the cluster. SparkR requires R binaries installed on all nodes. Spark access through Livy requires the...

Simply Install: Spark (Cluster Mode) | by Sriram Baskaran

Basic cluster setup and installation of Spark; How to configure communication between nodes · Identify the resource (CPU time, memory) needed to ...