Pipeline on GCP fails with "Error: pipeline dependencies not found"
See original GitHub issueDescribe the bug
I’ve submitted a good number (25) of ChIP-seq jobs to Caper, and the jobs begin running, but somehow halfway through, the Caper server dies suddenly. Examining the logs and grepping for “error”, I find that all of the job logs (in cromwell-workflow-logs/
) contain “Error: pipeline dependencies not found”.
I have consulted Issue #172, but I have verified that I have activated the encode-chip-seq-pipeline
einvironment both when launching the Caper server and when submitting the jobs. I am also experiencing these issues on GCP, and not on MacOS, so I felt it was prudent to create a new issue for this.
OS/Platform
- OS/Platform: Google Cloud
- Conda version: 4.7.12
- Pipeline version: I’m not sure how to check this, sorry
- Caper version: 1.4.2
Caper configuration file
backend=gcp
gcp-prj=gbsc-gcp-lab-kundaje
tmp-dir=/data/tmp_amtseng
singularity-cachedir=/data/singularity_cachedir_amtseng
file-db=/data/caper_db/caper_file_db_amtseng
db-timeout=120000
max-concurrent-tasks=1000
max-concurrent-workflows=50
use-google-cloud-life-sciences=True
gcp-region=us-central1
Input JSON file
Here, I’m showing one of the 25 jobs submitted.
{
"chip.title": "A549_cJun_FLAG cells untreated",
"chip.description": "A549_cJun_FLAG cells untreated",
"chip.pipeline_type": "tf",
"chip.aligner": "bowtie2",
"chip.align_only": false,
"chip.true_rep_only": false,
"chip.genome_tsv": "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv",
"chip.paired_end": false,
"chip.ctl_paired_end": false,
"chip.always_use_pooled_ctl": true,
"chip.align_cpu": 4,
"chip.call_peak_cpu": 4,
"chip.fastqs_rep1_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090532.fastq.gz"
],
"chip.fastqs_rep2_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090533.fastq.gz"
],
"chip.fastqs_rep3_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090534.fastq.gz"
],
"chip.ctl_fastqs_rep1_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090601.fastq.gz"
],
"chip.ctl_fastqs_rep2_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090602.fastq.gz"
],
"chip.ctl_fastqs_rep3_R1": [
"gs://caper_in/amtseng/AP1/fastqs/SRR12090603.fastq.gz"
]
}
Troubleshooting result
Unfortunately, because the Caper server dies, I am unable to use caper troubleshoot {jobID}
to diagnose.
Instead, I’ve attached the cromwell log for the job. The end of this log is:
I’ve also attached cromwell.out
.
workflow.3d1cb136-9b32-4514-9a33-3262d8303d6f.log
Thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top GitHub Comments
That command line looks good if your Google user account settings have enough permission to GCE, GCS and Google Life Sciences API and on on.
Why don’t use a configuration file
~/.caper/default.conf
? You can make a good template of it by running the following:BTW I strongly recommend to use the above shell script because ENCODE DCC runs thousands of pipeline without any problem on the instance created by that shell script.
Not sure if you have a service account with correct permissions settings. Please use the above script.
Ah, I’m sorry. I misunderstood which script you were referring to. I’ll try to create an instance using
create_instance.sh
instead of the pre-existing instance we have in the lab.