java.io.IOException: Error getting 'BUCKET_GLOBAL_IDENTIFIER' bucket
See original GitHub issueI want to enable Spark to export data to Google Cloud Storage, instead of saving it on HDFS. To achieve this, I have installed Google Cloud Storage Connector for Spark. Here’s a sample code inside a Spark context, which I use to save a dataframe to a bucket:
val someDF = Seq(
(8, "bat"),
(64, "mouse"),
(-27, null)
).toDF("number", "word")
val conf = sc.hadoopConfiguration
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
conf.set("fs.gs.project.id", PROJECT_ID)
conf.set("fs.gs.auth.service.account.enable", "true")
conf.set("fs.gs.auth.service.account.json.keyfile", LOCATION_TO_KEY.json)
someDF
.write
.format("parquet")
.mode("overwrite")
.save(s"gs://BUCKET_GLOBAL_IDENTIFIER/A_FOLDER_IN_A_BUCKET/")
I receive a rather cryptic exception after the code is executed:
java.io.IOException: Error getting 'BUCKET_GLOBAL_IDENTIFIER' bucket
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$8.onFailure(GoogleCloudStorageImpl.java:1633)
at com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:183)
at com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageExceptions.createJsonResponseException(GoogleCloudStorageExceptions.java:82)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$8.onFailure(GoogleCloudStorageImpl.java:1624)
... 6 more
Could anyone give me a clue on how to tackle this? Here’s a list of issues I’ve already solved, to get to this point:
- The key could not be accessed by Spark. The issue was that it was not available on physical nodes, which Spark was run on.
- GCS service account, used for the Spark connector, did not have a permission to create a bucket. The issue was solved by saving the data to an already existing bucket.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
java.io.IOException: Error getting ...
This error means that configured service account doesn't have access to the <BUCKET_GLOBAL_IDENTIFIER> bucket or doesn't have permissions to ...
Read more >Amazon EMR and Hive: Getting a "java.io.IOException
The input files must be directly in the input directory or Amazon S3 bucket that you specify, not in sub-directories. According to this...
Read more >Error in accessing google cloud storage bucket via hadoop fs
Hi,. I am getting the below error while accessing a Google Cloud Storage bucket for the first time via Cloudera CDH 6.3.3 Hadoop...
Read more >Error: "Bucket is a requester pays bucket but no user project ...
Hi! I am trying to annotate a matrix with CADD scores. db = hl.experimental.DB(region='us', cloud='gcp') mt = db.annotate_rows_db(mt, 'CADD') Tried to ...
Read more >Troubleshooting | VPC Service Controls - Google Cloud
Using the error's unique ID; Filter logs using metadata ... java.io.IOException: Error accessing: bucket: corp-resources-public-1, object: out.txt
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This error means that configured service account doesn’t have access to the
<BUCKET_GLOBAL_IDENTIFIER>
bucket or doesn’t have permissions to perform bucket get requests.May you test your configuration by specifying non-existent bucket? (GCS connector should create this bucket by itself in this case)
@ashishdeok15 please open a new issue and provide tailed description including code snippets of what are you doing and exception w/ stack trace that you are facing.