question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarification about installation with spark

See original GitHub issue

Hi,

I am trying to install the GCS connector. The documentation state:

Note that you do not need to configure Hadoop in order to use the GCS connector with Spark.

But I am getting the following exception:

java.io.IOException: No FileSystem for scheme: gs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
  at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkA

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:17 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
medbcommented, Sep 23, 2019

Yes, this is a shaded jar.

Shaded jar is a library jar that includes all its dependencies, it has -shaded suffix in the name in the Maven repo and when you are building project: https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.0.0/

3reactions
hadrienkcommented, Jun 20, 2019

Ok, I got it working:

"spark.hadoop.fs.gs.impl": {
    "value": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
    "type": "string"
},
"spark.hadoop.google.cloud.auth.service.account.enable": {
    "value": true,
    "type": "checkbox"
},
"spark.hadoop.google.cloud.auth.service.account.json.keyfile": {
    "value": "/zeppelin/key.json",
    "type": "string"
}

The PR is using a newer version maybe? https://github.com/GoogleCloudPlatform/bigdata-interop/pull/180/files because com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem was not found in the jar from the documentation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Clarification about installation with spark · Issue #188 - GitHub
Hi, I am trying to install the GCS connector. The documentation state: Note that you do not need to configure Hadoop in order...
Read more >
Step-by-Step Apache Spark Installation Tutorial - ProjectPro
This tutorial is a step-by-step guide to install Apache Spark. Installation of JAVA 8 for JVM and has examples of Extract, Transform and...
Read more >
Installing Spark | Spark Deployment Modes - InformIT
Big data consultant Jeffrey Aven covers the basics about how Spark is deployed and how to install Spark.
Read more >
Spark prerequisites - Cloudera Documentation
Spark Thrift server requires Hive deployed on the cluster. SparkR requires R binaries installed on all nodes. Spark access through Livy requires the...
Read more >
Simply Install: Spark (Cluster Mode) | by Sriram Baskaran
Basic cluster setup and installation of Spark; How to configure communication between nodes · Identify the resource (CPU time, memory) needed to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found