question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sometimes getting "Requester pays bucket access requires authentication." from google

See original GitHub issue

Hello,

I’m using Cromwell with Google Pipelines as backend and sometimes (maybe when more than 30 analysis running at same time) I’m getting workflow error (~2 of the 30). When inspecting the metadata for the workflow I can see a error message that contains “ServiceException: 401 Requester pays bucket access requires authentication.”.

Edit: Using Cromwell 35

Has anyone had a similar problem? Here are the WDL task that are affected (from Broad’s five dollar genome workflow):

task BaseRecalibrator {
  File input_bam
  File input_bam_index
  String recalibration_report_filename
  Array[String] sequence_group_interval
  File dbSNP_vcf
  File dbSNP_vcf_index
  Array[File] known_indels_sites_VCFs
  Array[File] known_indels_sites_indices
  File ref_dict
  File ref_fasta
  File ref_fasta_index
  Int disk_size
  Int preemptible_tries

  command {
   /usr/gitc/gatk4/gatk-launch --javaOptions "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:+PrintFlagsFinal \
      -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails \
      -Xloggc:gc_log.log -Xms4000m" \
      BaseRecalibrator \
      -R ${ref_fasta} \
      -I ${input_bam} \
      --useOriginalQualities \
      -O ${recalibration_report_filename} \
      -knownSites ${dbSNP_vcf} \
      -knownSites ${sep=" -knownSites " known_indels_sites_VCFs} \
      -L ${sep=" -L " sequence_group_interval}
  }
  runtime {
    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"
    memory: "6 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
  }
  output {
    File recalibration_report = "${recalibration_report_filename}"
  }
}

And here is my cromwell server config:

include required(classpath("application"))

webservice {
  port = 8000
}

system {
  workflow-restart = true
}

engine {
  filesystems {

    gcs {
      auth = "service-account"
    }

    http {}

    local {
      localization: [
        "hard-link", "soft-link", "copy"
      ]
    }
  }
}

backend {
  default = "Local"
  providers {

    Local {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        max-concurrent-workflows = 1
        concurrent-job-limit = 1
      }
    }

    PAPIv2 {
      actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
      config {
        project = "bioinfo-XXXXXXX"
        root = "gs://XXXXXXXX"
        genomics-api-queries-per-100-seconds = 1000
        max-concurrent-workflows = 80
        concurrent-job-limit = 200
        maximum-polling-interval = 600

        genomics {
          # Config from google stanza
          auth = "service-account"

   
          # Endpoint for APIs, no reason to change this unless directed by Google.
          endpoint-url = "https://genomics.googleapis.com/"
          localization-attempts = 3
        }

        filesystems {
          gcs {
            # A reference to a potentially different auth for manipulating files via engine functions.
            auth = "service-account"
          }
        }
      }
    }
  }
}

# Google authentication
google {
  application-name = "cromwell"
  auths = [
    {
      name = "application-default"
      scheme = "application_default"
    },
    {
      name = "service-account"
      scheme = "service_account"
      service-account-id = "XXXXXXXXXXXXXX@XXXXXXXXXXXX.gserviceaccount.com"
      json-file = "/var/secrets/google/key.json"
    }
  ]
}

# database connection
database {
  profile = "slick.jdbc.MySQLProfile$"
  db {
    driver = "com.mysql.jdbc.Driver"
    url = "jdbc:mysql://cromwell-db/cromwell?rewriteBatchedStatements=true&useSSL=false"
    user = "XXXXXXXXXXX"
    password = "XXXXXXXXXXX"
    connectionTimeout = 5000
  }
}

call-caching {
  enabled = true
  invalidate-bad-cache-results = true
}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
ruchimcommented, Feb 21, 2019

Hey @lmtani

Thanks for reporting this. Another issue is designed to fix the transient failure mode you’re describing. When that issue is closed, you should see this failure mode drop.

Closing this as it’s a duplicate.

1reaction
lmtanicommented, Nov 1, 2018

Thank you for helping.

All my inputs are in buckets without requester pays (checked with gsutil requesterpays get gs://<bucket>.

I’ll set ‘project’ like you suggested. Another batch of workflows are coming to analysis this weekend. If everything ends successfully there is a good chance that only the name of the project is really missing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use Requester Pays | Cloud Storage
Access Requester Pays buckets · In the Google Cloud console, go to the Cloud Storage Buckets page. · In the list of buckets,...
Read more >
google storage error: Bucket is requester pays bucket but no ...
But I get a "Requester pays bucket access requires authentication" error (when I use Python), even though I log in with "gcloud auth...
Read more >
Using Requester Pays buckets for storage transfers and usage
If you enable Requester Pays on a bucket, anonymous access to that bucket is not allowed. You must authenticate all requests involving Requester...
Read more >
GCSFs Documentation - Read the Docs
A pythonic file-system interface to Google Cloud Storage. ... attempt to use your default gcloud credentials or, attempt to get credentials.
Read more >
Amazon S3 - Tutorials Dojo
Cross-account IAM roles for programmatic and console access to S3 bucket objects. Requester Pays Buckets. Bucket owners pay for all of the Amazon...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found