question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GCS URLs that are not valid URIs are not supported

See original GitHub issue

The following path isn’t copying into the task correctly:

gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar

It gives the following message in the JES logs:

2017/03/20 15:37:09 I: Docker file /cromwell_root/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar maps to host location /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar.
2017/03/20 15:37:09 I: Copying gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar to /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar
2017/03/20 15:37:09 I: Running command: sudo gsutil -q -m cp gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar

And the exec.sh that it generates is:

#!/bin/bash
export _JAVA_OPTIONS=-Djava.io.tmpdir=/cromwell_root/tmp
export TMPDIR=/cromwell_root/tmp

(
cd /cromwell_root
if [ false = false ]; \
  then java -Xmx1g -jar gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar PadTargets --targets /cromwell_root/broad-dsde-methods/th/target/ice_targets.tsv --output targets.padded.tsv \
        --padding 250 --help false --version false --verbosity INFO --QUIET false; \
  else touch targets.padded.tsv; \
fi
)
echo $? > /cromwell_root/PadTargets-rc.txt.tmp
(
cd /cromwell_root

)
mv /cromwell_root/PadTargets-rc.txt.tmp /cromwell_root/PadTargets-rc.txt

The WDL that has this issue is:

workflow BrokenFilePath {
  File targets = "gs://broad-dsde-methods/th/target/ice_targets.tsv"
  File GATK_protected_jar = "gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar"
  Boolean isWGS = false
  Int padding = 250

  call PadTargets {
    input:
        target_file=targets,
        gatk_jar=GATK_protected_jar,
        isWGS=isWGS,
        mem=1,
        padding=padding
  }
}

task PadTargets {
    File target_file
    Int padding
    File gatk_jar
    Boolean isWGS
    Int mem

    command {
        if [ ${isWGS} = false ]; \
          then java -Xmx${mem}g -jar ${gatk_jar} PadTargets --targets ${target_file} --output targets.padded.tsv \
                --padding ${padding} --help false --version false --verbosity INFO --QUIET false; \
          else touch targets.padded.tsv; \
        fi
    }

    output {
        File padded_target_file = "targets.padded.tsv"
    }

    runtime {
      docker: "broadinstitute/genomes-in-the-cloud:2.2.4-1469632282"
      zones: "us-central1-a us-central1-b"
      disks: "local-disk 200 SSD"
      memory: "6G"
    }
}

Note that I have not tried running this with version 25. The problem occurred both on version 24 and in firecloud.

@ruchim @Horneth

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mcovarrcommented, Jun 14, 2017

I should have made this explicit in the previous comment:

I don’t currently see a way we can support underscores in bucket names as long as we’re using Google’s GCS NIO filesystem. But I do think Cromwell can and should fail with useful and timely error messages when presented with bucket names that will not work.

0reactions
mcovarrcommented, Jun 14, 2017

The current JesExpressionFunctions#preMapping logic does not distinguish between a path that fails to parse as a valid GCS URI for being relative to the call root and a path that fails to parse for being a fully qualified gs://uri_with_underscores/file.txt containing invalid characters in the bucket name.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Signed URLs | Cloud Storage - Google Cloud
When working with resumable uploads, you only create and use a signed URL ... Since the session URI acts as an authentication token,...
Read more >
How to fix "Error: Invalid GCS Path Specified" when using ...
In your story, you have specified an HTTP URL where a GCS Url was expected. You must not specify an HTTP URL where...
Read more >
Google Speech to Text tool doesn't work with external url not a ...
I am having trouble testing the Google Cloud Speech to Text by passing in an external url(that is not a GCS url) for...
Read more >
How to extract bucket and file name from a Google Cloud ...
Implicit data validation: only true GCS URIs will be matched. With the Python split method any non-URI string can be passed to the...
Read more >
Setting up OAuth 2.0 - Google Cloud Platform Console Help
When a project goes through verification, the current status displays under Verification status: Not Published: your OAuth consent screen is not published and ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found