'No FileSystem for scheme: gs' Error
See original GitHub issueI’m running PySpark application on a standalone Spark cluster. I can see that the gcs connector (gcs-connector-hadoop2-2.0.1.jar
) is added successfully when I’m submitting an application:
21/01/25 11:46:25 INFO SparkContext: Added JAR /opt/spark/jars/gcs-connector-hadoop2-2.0.1.jar at spark://pyspark-1.c.master.internal:43981/jars/gcs-connector-hadoop2-2.0.1.jar with timestamp 1611575185463
but gs://
prefix isn’t recognized.
py4j.protocol.Py4JJavaError: An error occurred while calling o41.json.
: java.io.IOException: No FileSystem for scheme: gs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:477)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
This is my configuration:
{
"spark.jars": "/opt/spark/jars/gcs-connector-hadoop2-2.0.1.jar",
"google.cloud.auth.service.account.enable": "true",
"google.cloud.auth.service.account.json.keyfile": "/path/to/service/account/file.json"
}
Then, I tried adding the following configuration:
"fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
but still I’m getting an error:
py4j.protocol.Py4JJavaError: An error occurred while calling o45.json.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:477)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
... 25 more
Hope you could help, thank you!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
"No Filesystem for Scheme: gs" when running spark job locally
xml to the classpath, now I get the following error upon doing sc = new JavaSparkContext(conf); - java.lang.ClassNotFoundException: org.apache.hadoop.fs.
Read more >When writing to BQ run into this error No FileSystem for ...
I want to write to BQ from databricks using google bq connector but run into an error. Here are the codes and error...
Read more >IOException: No FileSystem for scheme: gs - Hail Discussion
while trying to run load_dataset on AWS EMR I see below error. I am using pyspark and initiating hail. ... SparkSession available as...
Read more >Installing HDFS Google Cloud Connector - Cloudera Community
... I am receiving the following error: No FileSystem for scheme: gs. ... The other step is to add the gs settings to...
Read more >java.io.IOException: No FileSystem for scheme: null
Getting the error when try to load the uploaded file in py notebook. # File location and type file_location = "//FileStore/tables/data/d1.csv".
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@medb Thank you, the solution that helped me was to add
--driver-class-path
to spark-submit commandAlso, please take a look at https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/180, maybe this information will be useful to solve your issue.