question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SLURM integration follow up

See original GitHub issue

Hi there!

This is more a question/discussion than an issue, but I though this was the right place. Please forgive me if its not 😃

On the past few weeks we all have been working in some way or another mainly in two things:

  • Integrating the SLURM batch system into Toil
  • Running CWL jobs and workflows using cwltoil

In our team, so far we’ve managed to run the bcbio CWL workflow example both using cwltool and cwltoil in single machines after the issues with --no-container and --preserve-environment were solved (see #863 and #882 ).

Yesterday @chapmanb managed to run also the complete workflow using SLURM, which is awesome! We were getting some strange errors about caching, but adding the --disableSharedCache option was enough to solve them.

We would love to keep the work on this SLURM and CWL integration with Toil, so we decided to open this discussion to decide how to proceed. What we would need is mainly:

  1. Maybe a bit more documentation on how the caching and I/O works in Toil. Couldn’t find anything relevant in the documentation, did I skip something @hannes-ucsc ?
  2. Progress on the SLURM integration: In our group we would like to at least be able to select walltime and queues other than the default ones. @dleehr do you have any plans on working on this? Can we help?

Hopefully this will be helpful to all of us, cheers!

Ping @dleehr @tetron @chapmanb @brainstorm @ohofmann

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:2
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
chapmanbcommented, Jun 3, 2016

Thanks Guillermo for starting this. I moved the cache discussion to #928 with more details so this PR can focus on Slurm. For Slurm. we’d like to be able to pass arbitrary options so that users can tweak their submissions for their particular Slurm setup (queues and timelimits are probably the first two options we’ll need).

Looking at the SGE implementation, it looks like that’s possible through special environmental variables (TOIL_GRIDENGINE_PE) and some kind of internal mechanism (self.boss.environment):

https://github.com/BD2KGenomics/toil/blob/master/src/toil/batchSystems/gridengine.py#L161

It would be great to know the recommended approach to doing this and we could extend Slurm to also allow configurability through the same mechanisms for consistency. Dan, happy to coordinate with work you’re planning. Thanks all.

1reaction
brainstormcommented, Jul 22, 2016

@cjfields, I heard from @ohofmann that you’re interested in having SLURM support on Toil as well. Here are some pointers me and @chapmanb are working on, should you (or anyone else) want to jump in on our testing/dev efforts:

https://forums.docker.com/t/docker-volumes-flapping-between-start-and-stop-states-while-the-container-is-running/10641

Ignore the docker parts for now, to run on your cluster/machine you would need to:

curl https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh | bash
conda install -c bioconda cwltool
wget https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz1 && tar xvfz test_bcbio_cwl.tar.gz && cd test_bcbio_cwl
sed -ie 's/docker_vm_uid() or os.geteuid()/0/g' $HOME/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/job.py
chmod +x *.sh && ./run_cwltool.sh

Once that works, you can try out the following bcbio_vm.py convenience wrapper commands:

https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-bcbio-cwl-on-toil

Read more comments on GitHub >

github_iconTop Results From Across the Web

Frequently Asked Questions - Slurm Workload Manager
The following command is an example on Redhat 6: ... It has integration with Slurm as well as Torque resource managers.
Read more >
Moab-SLURM Integration Guide - Adaptive Computing
Moab can be used as the scheduler for the SLURM resource manager. In this configuration, the SLURM handles the job queue and the...
Read more >
Useful Slurm commands
The sacct command allows users to pull up status information about past jobs. This command is very similar to sstat, but is used...
Read more >
6. Integration with Slurm — V-IPU User Guide
The section describes integration of the V-IPU with Slurm. Slurm is a popular open-source cluster management and job scheduling system. The integration of ......
Read more >
Appendix - UFM SLURM Integration - NVIDIA Networking Docs
Modify the SLURM configuration file on the SLURM controller, /etc/slurm/slurm.conf , and add/modify the following two parameters:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found