GCS URLs that are not valid URIs are not supported
See original GitHub issueThe following path isn’t copying into the task correctly:
gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar
It gives the following message in the JES logs:
2017/03/20 15:37:09 I: Docker file /cromwell_root/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar maps to host location /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar.
2017/03/20 15:37:09 I: Copying gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar to /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar
2017/03/20 15:37:09 I: Running command: sudo gsutil -q -m cp gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar /mnt/local-disk/gs:/bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar
And the exec.sh
that it generates is:
#!/bin/bash
export _JAVA_OPTIONS=-Djava.io.tmpdir=/cromwell_root/tmp
export TMPDIR=/cromwell_root/tmp
(
cd /cromwell_root
if [ false = false ]; \
then java -Xmx1g -jar gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar PadTargets --targets /cromwell_root/broad-dsde-methods/th/target/ice_targets.tsv --output targets.padded.tsv \
--padding 250 --help false --version false --verbosity INFO --QUIET false; \
else touch targets.padded.tsv; \
fi
)
echo $? > /cromwell_root/PadTargets-rc.txt.tmp
(
cd /cromwell_root
)
mv /cromwell_root/PadTargets-rc.txt.tmp /cromwell_root/PadTargets-rc.txt
The WDL that has this issue is:
workflow BrokenFilePath {
File targets = "gs://broad-dsde-methods/th/target/ice_targets.tsv"
File GATK_protected_jar = "gs://bg_tag_team/Tumor_Only_Resources/gatk-protected-1.0.0.0-alpha1.2.4.jar"
Boolean isWGS = false
Int padding = 250
call PadTargets {
input:
target_file=targets,
gatk_jar=GATK_protected_jar,
isWGS=isWGS,
mem=1,
padding=padding
}
}
task PadTargets {
File target_file
Int padding
File gatk_jar
Boolean isWGS
Int mem
command {
if [ ${isWGS} = false ]; \
then java -Xmx${mem}g -jar ${gatk_jar} PadTargets --targets ${target_file} --output targets.padded.tsv \
--padding ${padding} --help false --version false --verbosity INFO --QUIET false; \
else touch targets.padded.tsv; \
fi
}
output {
File padded_target_file = "targets.padded.tsv"
}
runtime {
docker: "broadinstitute/genomes-in-the-cloud:2.2.4-1469632282"
zones: "us-central1-a us-central1-b"
disks: "local-disk 200 SSD"
memory: "6G"
}
}
Note that I have not tried running this with version 25. The problem occurred both on version 24 and in firecloud.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Signed URLs | Cloud Storage - Google Cloud
When working with resumable uploads, you only create and use a signed URL ... Since the session URI acts as an authentication token,...
Read more >How to fix "Error: Invalid GCS Path Specified" when using ...
In your story, you have specified an HTTP URL where a GCS Url was expected. You must not specify an HTTP URL where...
Read more >Google Speech to Text tool doesn't work with external url not a ...
I am having trouble testing the Google Cloud Speech to Text by passing in an external url(that is not a GCS url) for...
Read more >How to extract bucket and file name from a Google Cloud ...
Implicit data validation: only true GCS URIs will be matched. With the Python split method any non-URI string can be passed to the...
Read more >Setting up OAuth 2.0 - Google Cloud Platform Console Help
When a project goes through verification, the current status displays under Verification status: Not Published: your OAuth consent screen is not published and ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I should have made this explicit in the previous comment:
I don’t currently see a way we can support underscores in bucket names as long as we’re using Google’s GCS NIO filesystem. But I do think Cromwell can and should fail with useful and timely error messages when presented with bucket names that will not work.
The current
JesExpressionFunctions#preMapping
logic does not distinguish between a path that fails to parse as a valid GCS URI for being relative to the call root and a path that fails to parse for being a fully qualifiedgs://uri_with_underscores/file.txt
containing invalid characters in the bucket name.