Clarification about installation with spark
See original GitHub issueHi,
I am trying to install the GCS connector. The documentation state:
Note that you do not need to configure Hadoop in order to use the GCS connector with Spark.
But I am getting the following exception:
java.io.IOException: No FileSystem for scheme: gs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkA
Issue Analytics
- State:
- Created 4 years ago
- Comments:17 (7 by maintainers)
Top Results From Across the Web
Clarification about installation with spark · Issue #188 - GitHub
Hi, I am trying to install the GCS connector. The documentation state: Note that you do not need to configure Hadoop in order...
Read more >Step-by-Step Apache Spark Installation Tutorial - ProjectPro
This tutorial is a step-by-step guide to install Apache Spark. Installation of JAVA 8 for JVM and has examples of Extract, Transform and...
Read more >Installing Spark | Spark Deployment Modes - InformIT
Big data consultant Jeffrey Aven covers the basics about how Spark is deployed and how to install Spark.
Read more >Spark prerequisites - Cloudera Documentation
Spark Thrift server requires Hive deployed on the cluster. SparkR requires R binaries installed on all nodes. Spark access through Livy requires the...
Read more >Simply Install: Spark (Cluster Mode) | by Sriram Baskaran
Basic cluster setup and installation of Spark; How to configure communication between nodes · Identify the resource (CPU time, memory) needed to ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes, this is a shaded jar.
Shaded jar is a library jar that includes all its dependencies, it has
-shaded
suffix in the name in the Maven repo and when you are building project: https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.0.0/Ok, I got it working:
The PR is using a newer version maybe? https://github.com/GoogleCloudPlatform/bigdata-interop/pull/180/files because
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
was not found in the jar from the documentation.