Sometimes getting "Requester pays bucket access requires authentication." from google
See original GitHub issueHello,
I’m using Cromwell with Google Pipelines as backend and sometimes (maybe when more than 30 analysis running at same time) I’m getting workflow error (~2 of the 30). When inspecting the metadata for the workflow I can see a error message that contains “ServiceException: 401 Requester pays bucket access requires authentication.”.
Edit: Using Cromwell 35
Has anyone had a similar problem? Here are the WDL task that are affected (from Broad’s five dollar genome workflow):
task BaseRecalibrator {
File input_bam
File input_bam_index
String recalibration_report_filename
Array[String] sequence_group_interval
File dbSNP_vcf
File dbSNP_vcf_index
Array[File] known_indels_sites_VCFs
Array[File] known_indels_sites_indices
File ref_dict
File ref_fasta
File ref_fasta_index
Int disk_size
Int preemptible_tries
command {
/usr/gitc/gatk4/gatk-launch --javaOptions "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal \
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails \
-Xloggc:gc_log.log -Xms4000m" \
BaseRecalibrator \
-R ${ref_fasta} \
-I ${input_bam} \
--useOriginalQualities \
-O ${recalibration_report_filename} \
-knownSites ${dbSNP_vcf} \
-knownSites ${sep=" -knownSites " known_indels_sites_VCFs} \
-L ${sep=" -L " sequence_group_interval}
}
runtime {
docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"
memory: "6 GB"
disks: "local-disk " + disk_size + " HDD"
preemptible: preemptible_tries
}
output {
File recalibration_report = "${recalibration_report_filename}"
}
}
And here is my cromwell server config:
include required(classpath("application"))
webservice {
port = 8000
}
system {
workflow-restart = true
}
engine {
filesystems {
gcs {
auth = "service-account"
}
http {}
local {
localization: [
"hard-link", "soft-link", "copy"
]
}
}
}
backend {
default = "Local"
providers {
Local {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
max-concurrent-workflows = 1
concurrent-job-limit = 1
}
}
PAPIv2 {
actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
config {
project = "bioinfo-XXXXXXX"
root = "gs://XXXXXXXX"
genomics-api-queries-per-100-seconds = 1000
max-concurrent-workflows = 80
concurrent-job-limit = 200
maximum-polling-interval = 600
genomics {
# Config from google stanza
auth = "service-account"
# Endpoint for APIs, no reason to change this unless directed by Google.
endpoint-url = "https://genomics.googleapis.com/"
localization-attempts = 3
}
filesystems {
gcs {
# A reference to a potentially different auth for manipulating files via engine functions.
auth = "service-account"
}
}
}
}
}
}
# Google authentication
google {
application-name = "cromwell"
auths = [
{
name = "application-default"
scheme = "application_default"
},
{
name = "service-account"
scheme = "service_account"
service-account-id = "XXXXXXXXXXXXXX@XXXXXXXXXXXX.gserviceaccount.com"
json-file = "/var/secrets/google/key.json"
}
]
}
# database connection
database {
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "com.mysql.jdbc.Driver"
url = "jdbc:mysql://cromwell-db/cromwell?rewriteBatchedStatements=true&useSSL=false"
user = "XXXXXXXXXXX"
password = "XXXXXXXXXXX"
connectionTimeout = 5000
}
}
call-caching {
enabled = true
invalidate-bad-cache-results = true
}
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (7 by maintainers)
Top Results From Across the Web
Use Requester Pays | Cloud Storage
Access Requester Pays buckets · In the Google Cloud console, go to the Cloud Storage Buckets page. · In the list of buckets,...
Read more >google storage error: Bucket is requester pays bucket but no ...
But I get a "Requester pays bucket access requires authentication" error (when I use Python), even though I log in with "gcloud auth...
Read more >Using Requester Pays buckets for storage transfers and usage
If you enable Requester Pays on a bucket, anonymous access to that bucket is not allowed. You must authenticate all requests involving Requester...
Read more >GCSFs Documentation - Read the Docs
A pythonic file-system interface to Google Cloud Storage. ... attempt to use your default gcloud credentials or, attempt to get credentials.
Read more >Amazon S3 - Tutorials Dojo
Cross-account IAM roles for programmatic and console access to S3 bucket objects. Requester Pays Buckets. Bucket owners pay for all of the Amazon...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @lmtani
Thanks for reporting this. Another issue is designed to fix the transient failure mode you’re describing. When that issue is closed, you should see this failure mode drop.
Closing this as it’s a duplicate.
Thank you for helping.
All my inputs are in buckets without requester pays (checked with
gsutil requesterpays get gs://<bucket>
.I’ll set ‘project’ like you suggested. Another batch of workflows are coming to analysis this weekend. If everything ends successfully there is a good chance that only the name of the project is really missing.