Caper fails to launch pipeline
See original GitHub issueException thrown on pipeline launch, seems to be a Caper problem (get same output when running caper without input JSON). Running on SLURM cluster with pipeline v1.8.0.
Caper configuration file
backend=slurm
# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=normal
slurm-account=
# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
# file: use md5sum hash (slow).
# path: use path.
# path+modtime: use path and modification time.
local-hash-strat=path+modtime
# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/kdemuren/data/tmp-caper/
cromwell=/home/kdemuren/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/kdemuren/.caper/womtool_jar/womtool-52.jar
Stdout/error from pipeline run
2020-08-22 18:05:03,673|caper.caper_base|INFO| Creating a timestamped temporary directory. /home/kdemuren/data/tmp-caper/atac/20200822_180503_672685
2020-08-22 18:05:03,673|caper.caper_runner|INFO| Localizing files on work_dir. /home/kdemuren/data/tmp-caper/atac/20200822_180503_672685
2020-08-22 18:05:05,158|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2020-08-22 18:05:10,364|caper.cromwell|INFO| Womtool validation passed.
2020-08-22 18:05:10,365|caper.caper_runner|INFO| launching run: wdl=/net/bmc-pub14/data/boyer/users/kdemuren/atac-seq-pipeline/atac.wdl, inputs=/net/bmc-pub14/data/boyer/users/kdemuren/CM_ATAC_DMSO.json, backend_conf=/home/kdemuren/data/tmp-caper/atac/20200822_180503_672685/backend.conf
2020-08-22 18:05:27,113|caper.cromwell_workflow_monitor|INFO| Workflow: id=4238d010-93c3-40f2-b315-ab71b795acfc, status=Submitted
2020-08-22 18:05:27,175|caper.cromwell_workflow_monitor|INFO| Workflow: id=4238d010-93c3-40f2-b315-ab71b795acfc, status=Running
2020-08-22 18:05:30,631|caper.cromwell_workflow_monitor|INFO| Workflow: id=4238d010-93c3-40f2-b315-ab71b795acfc, status=Failed
2020-08-22 18:05:37,153|caper.cromwell_metadata|WARNING| Failed to write metadata file. workflowRoot not found. wf_id=4238d010-93c3-40f2-b315-ab71b795acfc
2020-08-22 18:05:37,153|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2020-08-22 18:05:37,153|caper.nb_subproc_thread|ERROR| Subprocess failed. returncode=1
2020-08-22 18:05:37,153|caper.cli|ERROR| Check stdout/stderr in /net/bmc-pub14/data/boyer/users/kdemuren/output-atac/CM_DMSO/cromwell.out
* Started troubleshooting workflow: id=4238d010-93c3-40f2-b315-ab71b795acfc, status=Failed
* Found failures JSON object.
[
{
"causedBy": [
{
"message": "Task raise_exception has an invalid runtime attribute memory = !! NOT FOUND !!",
"causedBy": []
}
],
"message": "Runtime validation failed"
}
]
* Recursively finding failures in calls (tasks)...
cromwell.out
2020-08-22 18:05:12,893 INFO - Running with database db.url = jdbc:hsqldb:mem:ceaa239f-a238-454e-b534-c2601dc92129;shutdown=false;hsqldb.tx=mvcc
2020-08-22 18:05:25,396 INFO - Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
2020-08-22 18:05:25,422 INFO - [RenameWorkflowOptionsInMetadata] 100%
2020-08-22 18:05:25,583 INFO - Running with database db.url = jdbc:hsqldb:mem:aea05e05-7f06-4502-bd4c-8e5edebef296;shutdown=false;hsqldb.tx=mvcc
2020-08-22 18:05:26,180 INFO - Slf4jLogger started
2020-08-22 18:05:26,424 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO - Workflow heartbeat configuration:
{
"cromwellId" : "cromid-28ec5ec",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"failureShutdownDuration" : "5 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
2020-08-22 18:05:26,475 cromwell-system-akka.dispatchers.service-dispatcher-12 INFO - Metadata summary refreshing every 1 second.
2020-08-22 18:05:26,504 cromwell-system-akka.dispatchers.service-dispatcher-8 INFO - WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
2020-08-22 18:05:26,508 cromwell-system-akka.actor.default-dispatcher-4 INFO - KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
2020-08-22 18:05:26,526 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
2020-08-22 18:05:26,527 WARN - 'docker.hash-lookup.gcr-api-queries-per-100-seconds' is being deprecated, use 'docker.hash-lookup.gcr.throttle' instead (see reference.conf)
2020-08-22 18:05:27,000 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - JobExecutionTokenDispenser - Distribution rate: 1 per 2 seconds.
2020-08-22 18:05:27,030 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO - SingleWorkflowRunnerActor: Version 52
2020-08-22 18:05:27,043 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO - SingleWorkflowRunnerActor: Submitting workflow
2020-08-22 18:05:27,103 cromwell-system-akka.dispatchers.api-dispatcher-36 INFO - Unspecified type (Unspecified version) workflow 4238d010-93c3-40f2-b315-ab71b795acfc submitted
2020-08-22 18:05:27,131 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - SingleWorkflowRunnerActor: Workflow submitted UUID(4238d010-93c3-40f2-b315-ab71b795acfc)
2020-08-22 18:05:27,144 cromwell-system-akka.dispatchers.engine-dispatcher-33 INFO - 1 new workflows fetched by cromid-28ec5ec: 4238d010-93c3-40f2-b315-ab71b795acfc
2020-08-22 18:05:27,160 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - WorkflowManagerActor Starting workflow UUID(4238d010-93c3-40f2-b315-ab71b795acfc)
2020-08-22 18:05:27,169 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - WorkflowManagerActor Successfully started WorkflowActor-4238d010-93c3-40f2-b315-ab71b795acfc
2020-08-22 18:05:27,169 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - Retrieved 1 workflows from the WorkflowStoreActor
2020-08-22 18:05:27,202 cromwell-system-akka.dispatchers.engine-dispatcher-33 INFO - WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
2020-08-22 18:05:27,329 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - MaterializeWorkflowDescriptorActor [UUID(4238d010)]: Parsing workflow as WDL 1.0
2020-08-22 18:05:30,315 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - MaterializeWorkflowDescriptorActor [UUID(4238d010)]: Call-to-Backend assignments: atac.reproducibility_idr -> slurm, atac.fraglen_stat_pe -> slurm, atac.macs2_signal_track -> slurm, atac.idr_pr -> slurm, atac.count_signal_track -> slurm, atac.tss_enrich -> slurm, atac.pool_blacklist -> slurm, atac.idr -> slurm, atac.xcor -> slurm, atac.compare_signal_to_roadmap -> slurm, atac.filter_no_dedup -> slurm, atac.overlap -> slurm, atac.read_genome_tsv -> slurm, atac.frac_mito -> slurm, atac.reproducibility_overlap -> slurm, atac.bam2ta -> slurm, atac.idr_ppr -> slurm, atac.macs2_signal_track_pooled -> slurm, atac.gc_bias -> slurm, atac.error_input_data -> slurm, atac.preseq -> slurm, atac.filter -> slurm, atac.call_peak_ppr1 -> slurm, atac.call_peak_pr1 -> slurm, atac.pool_ta_pr2 -> slurm, atac.count_signal_track_pooled -> slurm, atac.call_peak_pooled -> slurm, atac.align_mito -> slurm, atac.align -> slurm, atac.call_peak_ppr2 -> slurm, atac.jsd -> slurm, atac.call_peak -> slurm, atac.qc_report -> slurm, atac.call_peak_pr2 -> slurm, atac.bam2ta_no_dedup -> slurm, atac.pool_ta_pr1 -> slurm, atac.pool_ta -> slurm, atac.overlap_pr -> slurm, atac.spr -> slurm, atac.annot_enrich -> slurm, atac.overlap_ppr -> slurm
2020-08-22 18:05:30,627 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - WorkflowManagerActor Workflow 4238d010-93c3-40f2-b315-ab71b795acfc failed (during InitializingWorkflowState): Task raise_exception has an invalid runtime attribute memory = !! NOT FOUND !!
2020-08-22 18:05:30,631 cromwell-system-akka.dispatchers.engine-dispatcher-32 INFO - WorkflowManagerActor WorkflowActor-4238d010-93c3-40f2-b315-ab71b795acfc is in a terminal state: WorkflowFailedState
2020-08-22 18:05:32,009 cromwell-system-akka.dispatchers.engine-dispatcher-6 INFO - Not triggering log of token queue status. Effective log interval = None
2020-08-22 18:05:32,521 cromwell-system-akka.dispatchers.engine-dispatcher-6 INFO - SingleWorkflowRunnerActor workflow finished with status 'Failed'.
2020-08-22 18:05:36,645 cromwell-system-akka.dispatchers.engine-dispatcher-68 INFO - SingleWorkflowRunnerActor writing metadata to /home/kdemuren/data/tmp-caper/atac/20200822_180503_672685/metadata.json
2020-08-22 18:05:36,676 INFO - Workflow polling stopped
2020-08-22 18:05:36,689 INFO - 0 workflows released by cromid-28ec5ec
2020-08-22 18:05:36,693 INFO - Shutting down WorkflowStoreActor - Timeout = 5 seconds
2020-08-22 18:05:36,696 INFO - Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
2020-08-22 18:05:36,698 INFO - Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
2020-08-22 18:05:36,699 cromwell-system-akka.dispatchers.engine-dispatcher-60 INFO - Aborting all running workflows.
2020-08-22 18:05:36,699 INFO - JobExecutionTokenDispenser stopped
2020-08-22 18:05:36,700 INFO - WorkflowStoreActor stopped
2020-08-22 18:05:36,705 INFO - WorkflowLogCopyRouter stopped
2020-08-22 18:05:36,705 INFO - Shutting down WorkflowManagerActor - Timeout = 3600 seconds
2020-08-22 18:05:36,705 cromwell-system-akka.dispatchers.engine-dispatcher-68 INFO - WorkflowManagerActor All workflows finished
2020-08-22 18:05:36,705 INFO - WorkflowManagerActor stopped
2020-08-22 18:05:37,015 INFO - Connection pools shut down
2020-08-22 18:05:37,017 INFO - Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
2020-08-22 18:05:37,017 INFO - Shutting down JobStoreActor - Timeout = 1800 seconds
2020-08-22 18:05:37,017 INFO - Shutting down CallCacheWriteActor - Timeout = 1800 seconds
2020-08-22 18:05:37,017 INFO - Shutting down ServiceRegistryActor - Timeout = 1800 seconds
2020-08-22 18:05:37,017 INFO - Shutting down DockerHashActor - Timeout = 1800 seconds
2020-08-22 18:05:37,017 INFO - Shutting down IoProxy - Timeout = 1800 seconds
2020-08-22 18:05:37,018 INFO - JobStoreActor stopped
2020-08-22 18:05:37,018 INFO - CallCacheWriteActor Shutting down: 0 queued messages to process
2020-08-22 18:05:37,018 INFO - SubWorkflowStoreActor stopped
2020-08-22 18:05:37,018 INFO - CallCacheWriteActor stopped
2020-08-22 18:05:37,018 INFO - WriteMetadataActor Shutting down: 0 queued messages to process
2020-08-22 18:05:37,019 INFO - KvWriteActor Shutting down: 0 queued messages to process
2020-08-22 18:05:37,021 INFO - IoProxy stopped
Error log
Caper automatically runs a troubleshooter for failed workflows. If it doesn’t then get a WORKFLOW_ID
of your failed workflow with caper list
or directly use a metadata.json
file on Caper’s output directory.
2020-08-22 18:19:23,146|caper.server_heartbeat|ERROR| Failed to read from a heartbeat file. ~/.caper/default_server_heartbeat
Traceback (most recent call last):
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/bin/caper", line 13, in <module>
main()
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cli.py", line 504, in main
client(parsed_args)
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cli.py", line 269, in client
subcmd_troubleshoot(c, args)
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cli.py", line 454, in subcmd_troubleshoot
wf_ids_or_labels=args.wf_id_or_label, embed_subworkflow=True
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper_client.py", line 129, in metadata
embed_subworkflow=embed_subworkflow,
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cromwell_rest_api.py", line 144, in get_metadata
workflows = self.find(workflow_ids, labels)
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cromwell_rest_api.py", line 226, in find
CromwellRestAPI.ENDPOINT_WORKFLOWS, params=CromwellRestAPI.PARAMS_WORKFLOWS
File "/home/kdemuren/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cromwell_rest_api.py", line 299, in __request_get
) from None
Exception: Failed to connect to Cromwell server. req=GET, url=http://localhost:8000/api/workflows/v1/query
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (4 by maintainers)
Top Results From Across the Web
caper - PyPI
Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python ... failed workflows with the same command line you used to start them.
Read more >Delta Live Tables failed to launch pipeline cluster
Delta Live Tables failed to launch pipeline cluster. I'm trying to run through the Delta Live Tables quickstart example on Azure Databricks.
Read more >ENCODE ATAC-seq analyzing pipeline hands-on tutorial
Running the pipeline and example slurm submission file Here are some notes for the slurm: – caper init local is a must, as...
Read more >chipseq_pipeline on Biowulf - NIH HPC
In this example the pipeline will only be run locally - i.e. it will not submit tasks as slurm jobs. Follow the caper...
Read more >Ronald Reagan's Big Pipeline Caper - Townhall
Reagan ultimately failed to stop the building of the pipeline, ... Soviet missile launch, it was just a computer prank I played on...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
-lnodes=1:ppn=
will be applied to Caper. https://github.com/ENCODE-DCC/caper/pull/91Dear @leepc12 , just an update to let you close the issue also for me: it worked! Apparently there is a problem in torque which for some mysterious reason let the pipeline jobs waiting in queue even when resources are available, but after some time (or asking the admin to force them) they end up running and all the analysis was concluded smoothly. So the custom backend was successfully processed in the end. Now I launched the pipeline on my samples with fingers crossed. Thank you very much for help!