question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support for staging INPUTS/OUTPUTS from/to AWS S3

See original GitHub issue

Functionality is needed to stage data files to/from S3 on a per-task level.

For the example WDL file:

task copy_file {
  String output_file
  File input_file

  command {
    cp  ${input_file} ${output_file}
  }
  runtime {
    docker: "ubuntu:latest"
  }
}

workflow wf_copy_file {
  call copy_file
}

and corresponding inputs.json

{
  "wf_cop_file.copy_file.input_file": "s3://myBucket/hello.txt",
  "wf_cop_file.copy_file.output_file": "greetings.txt"
}

The workflow execution should be able to copy the input file from S3 to the working task directory, and copy the output file “greetings.txt” to the configured S3 bucket for writing logs and outputs. An example of files written to the output S3 bucket would be:

# $WF_ID is the workflow identifier (e.g. "E6D5143C-89BC-4823-AED7-2A6AE00A1C2B")
s3://cromwell-output-bucket/$WF_ID/copy_file/outputs/greetings.txt
s3://cromwell-output-bucket/$WF_ID/copy_file/wf_copy_file-rc.txt
s3://cromwell-output-bucket/$WF_ID/copy_file/wf_copy_file-stdout.txt
s3://cromwell-output-bucket/$WF_ID/copy_file/wf_copy_file-stderr.txt

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
geoffjentrycommented, Jun 29, 2018

@delagoya as of not too long ago you can override the default bash usage in favor of the shell of your choice

0reactions
brainstormcommented, Aug 15, 2018

@elerch Careful with that always-on restart policy from docker. In my experience, it did not re-read env-files (in my case those env vars are sitting on the host’s /etc/defaults/ecs). I expected SIGHUP-like behavior when changing ecs-agent attributes like ECS_CLUSTER, i.e:

https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.umccrise-docker/files/bootstrap_instance.sh#L39

Instead, I had to resort to a systemd service that re-runs the ecs-agent docker container on boot:

https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.ecs-agent/tasks/main.yml#L75

Read more comments on GitHub >

github_iconTop Results From Across the Web

Input and output artifacts - AWS CodePipeline
Stages use input and output artifacts that are stored in the Amazon S3 artifact ... as an input artifact to the Deploy stage,...
Read more >
Create a pipeline that uses Amazon S3 as a deployment ...
The pipeline then uses Amazon S3 to deploy the files to your bucket. ... In Step 2: Add source stage, in Source provider,...
Read more >
Deploy artifacts to Amazon S3 in different accounts using ...
1. Open the Amazon S3 console in the development account. 2. In the Bucket name list, choose your development input S3 bucket. For...
Read more >
Tutorial: Create a simple pipeline (S3 bucket)
Follow the steps in this CodePipeline tutorial to create a simple two-stage pipeline using an S3 bucket as a code repository.
Read more >
Staging Data and Tables with Pipeline Activities
AWS Data Pipeline can stage input and output data in your pipelines to make it easier to use certain activities, such as ShellCommandActivity...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found