SLURM integration follow up
See original GitHub issueHi there!
This is more a question/discussion than an issue, but I though this was the right place. Please forgive me if its not 😃
On the past few weeks we all have been working in some way or another mainly in two things:
- Integrating the SLURM batch system into Toil
- Running CWL jobs and workflows using
cwltoil
In our team, so far we’ve managed to run the bcbio CWL workflow example both using cwltool
and cwltoil
in single machines after the issues with --no-container
and --preserve-environment
were solved (see #863 and #882 ).
Yesterday @chapmanb managed to run also the complete workflow using SLURM, which is awesome! We were getting some strange errors about caching, but adding the --disableSharedCache
option was enough to solve them.
We would love to keep the work on this SLURM and CWL integration with Toil, so we decided to open this discussion to decide how to proceed. What we would need is mainly:
- Maybe a bit more documentation on how the caching and I/O works in Toil. Couldn’t find anything relevant in the documentation, did I skip something @hannes-ucsc ?
- Progress on the SLURM integration: In our group we would like to at least be able to select walltime and queues other than the default ones. @dleehr do you have any plans on working on this? Can we help?
Hopefully this will be helpful to all of us, cheers!
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:9 (7 by maintainers)
Top GitHub Comments
Thanks Guillermo for starting this. I moved the cache discussion to #928 with more details so this PR can focus on Slurm. For Slurm. we’d like to be able to pass arbitrary options so that users can tweak their submissions for their particular Slurm setup (queues and timelimits are probably the first two options we’ll need).
Looking at the SGE implementation, it looks like that’s possible through special environmental variables (
TOIL_GRIDENGINE_PE
) and some kind of internal mechanism (self.boss.environment
):https://github.com/BD2KGenomics/toil/blob/master/src/toil/batchSystems/gridengine.py#L161
It would be great to know the recommended approach to doing this and we could extend Slurm to also allow configurability through the same mechanisms for consistency. Dan, happy to coordinate with work you’re planning. Thanks all.
@cjfields, I heard from @ohofmann that you’re interested in having SLURM support on Toil as well. Here are some pointers me and @chapmanb are working on, should you (or anyone else) want to jump in on our testing/dev efforts:
https://forums.docker.com/t/docker-volumes-flapping-between-start-and-stop-states-while-the-container-is-running/10641
Ignore the docker parts for now, to run on your cluster/machine you would need to:
Once that works, you can try out the following
bcbio_vm.py
convenience wrapper commands:https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-bcbio-cwl-on-toil