question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Solving the simultaneous singularity build using flock

See original GitHub issue

Further to the discussion in #4635, I’ve been thinking about a more elegant way to solve the awkwardness of running a scatter while using Singularity on HPC. The major issues include:

  • We run N singularity builds, for a scatter over N items, which wastes time and CPU, and writing N large images to the filesystem simultaneously will presumably challenge the filesystem.
  • We have to store N .sif images, which wastes space while the job is running
  • We have to delete the image after each singularity build

My first proposed solution was #4673, which would solve the problem but require a pull request to introduce a new hook to Cromwell. And it doesn’t look like the Cromwell team have been able to prioritise this.

My new thought is that we could use file locks (e.g. flock on linux) to deal with this issue, so that the first worker to run will create a file lock, then all subsequent workers will encounter that lock, and wait until it’s removed before attempting to build or run the image.

For example, we currently recommend this submit-docker configuration:

        submit-docker = """
            # Ensure singularity is loaded if it's installed as a module
            module load Singularity/3.0.1
            
            # Build the Docker image into a singularity image
            DOCKER_NAME=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})
            IMAGE=${cwd}/$DOCKER_NAME.sif
            if [ ! -f $IMAGE ]; then
                singularity pull $IMAGE docker://${docker}
            fi

            # Submit the script to SLURM
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${cwd}/execution/stdout \
              -e ${cwd}/execution/stderr \
              -t ${runtime_minutes} \
              ${"-c " + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --wrap "singularity exec --bind ${cwd}:${docker_cwd} $IMAGE ${job_shell} ${script}"
        """

I’m instead proposing this. Note the use of a single shared image directory (/singularity_cache in this example), and the use of flock to ensure the submit scripts aren’t competing with each other:

        submit-docker = """
            # Ensure singularity is loaded if it's installed as a module
            module load Singularity/3.0.1
            
            # Determine the filepath to the image
            DOCKER_NAME=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})
            IMAGE=/singularity_cache/$DOCKER_NAME.sif

            # Wait for an exclusive lock on the image 
            (
                flock --exclusive 200
                # Build the image
                if [ ! -f $IMAGE ]; then
                    singularity pull $IMAGE docker://${docker}
                fi
            ) 200>/var/lock/$IMAGE

            # Submit the script to SLURM
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${cwd}/execution/stdout \
              -e ${cwd}/execution/stderr \
              -t ${runtime_minutes} \
              ${"-c " + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --wrap "singularity exec --bind ${cwd}:${docker_cwd} $IMAGE ${job_shell} ${script}"
        """

I haven’t tested this on our HPC cluster (it’s down for maintenance sadly!), but I’m interested if this makes sense as something we could get into the containers tutorial in order to recommend to users. @illusional, @vsoch @geoffjentry

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:36 (12 by maintainers)

github_iconTop GitHub Comments

3reactions
rhpvordermancommented, May 12, 2020

I did not know of this thread. At our institute we have solved this differently. We use singularity exec and no specific pull command. This will try to locate the image in the cache whis is located in SINGULARITY_CACHEDIR (env variable). If it is already there it will use it. If not, it will download it. This will lead to race condition if it is used in a scatter.

We use https://github.com/biowdl/prepull-singularity to pull the images beforehand, so no race conditions occur.

I am also thinking of adding a docker_pull thing to the config, so you can do singularity exec {image} echo done! or something similar to make sure the cache is populated at workflow initialization time. I have no ETA on this though, for now the prepull singularity script works.

1reaction
vsochcommented, Oct 1, 2019

I hope you are doing the pull on a login / dev node and not on something running massively in parallel? Or that the shub:// uri is interchangeable with docker:// or library:// ? Doing exec/run/pull in parallel is what led to devastating events in July that warranted adding extreme limits for all users to the server, and almost was the end of Singularity Hub. Ideally this really needs to be done with just one pull, and done before anything is run in parallel.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Build a Container — Singularity User Guide 3.7 documentation
Downloading an existing container from the Container Library . You can use the build command to download a container from the Container...
Read more >
Singularity: Scientific containers for mobility of compute - PLOS
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity ...
Read more >
Containers - Cromwell - Read the Docs
Singularity is a container technology designed for use on HPC systems in ... We can prevent this by implementing a filelock with flock...
Read more >
Using Singularity | USC Advanced Research Computing
Singularity is an open-source application for creating and running software ... multiple instances of the same container simultaneously for different jobs.
Read more >
Containers on the HPC Clusters
Singularity is compatible with all Docker images and it can be used with GPUs ... These issues can be solved with Singularity by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found