Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Solving the simultaneous singularity build using flock

See original GitHub issue

Further to the discussion in #4635, I’ve been thinking about a more elegant way to solve the awkwardness of running a scatter while using Singularity on HPC. The major issues include:

We run N singularity builds, for a scatter over N items, which wastes time and CPU, and writing N large images to the filesystem simultaneously will presumably challenge the filesystem.
We have to store N .sif images, which wastes space while the job is running
We have to delete the image after each singularity build

My first proposed solution was #4673, which would solve the problem but require a pull request to introduce a new hook to Cromwell. And it doesn’t look like the Cromwell team have been able to prioritise this.

My new thought is that we could use file locks (e.g. flock on linux) to deal with this issue, so that the first worker to run will create a file lock, then all subsequent workers will encounter that lock, and wait until it’s removed before attempting to build or run the image.

For example, we currently recommend this submit-docker configuration:

        submit-docker = """
            # Ensure singularity is loaded if it's installed as a module
            module load Singularity/3.0.1
            
            # Build the Docker image into a singularity image
            DOCKER_NAME=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})
            IMAGE=${cwd}/$DOCKER_NAME.sif
            if [ ! -f $IMAGE ]; then
                singularity pull $IMAGE docker://${docker}
            fi

            # Submit the script to SLURM
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${cwd}/execution/stdout \
              -e ${cwd}/execution/stderr \
              -t ${runtime_minutes} \
              ${"-c " + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --wrap "singularity exec --bind ${cwd}:${docker_cwd} $IMAGE ${job_shell} ${script}"
        """

I’m instead proposing this. Note the use of a single shared image directory (/singularity_cache in this example), and the use of flock to ensure the submit scripts aren’t competing with each other:

        submit-docker = """
            # Ensure singularity is loaded if it's installed as a module
            module load Singularity/3.0.1
            
            # Determine the filepath to the image
            DOCKER_NAME=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})
            IMAGE=/singularity_cache/$DOCKER_NAME.sif

            # Wait for an exclusive lock on the image 
            (
                flock --exclusive 200
                # Build the image
                if [ ! -f $IMAGE ]; then
                    singularity pull $IMAGE docker://${docker}
                fi
            ) 200>/var/lock/$IMAGE

            # Submit the script to SLURM
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${cwd}/execution/stdout \
              -e ${cwd}/execution/stderr \
              -t ${runtime_minutes} \
              ${"-c " + cpus} \
              --mem-per-cpu=${requested_memory_mb_per_core} \
              --wrap "singularity exec --bind ${cwd}:${docker_cwd} $IMAGE ${job_shell} ${script}"
        """

I haven’t tested this on our HPC cluster (it’s down for maintenance sadly!), but I’m interested if this makes sense as something we could get into the containers tutorial in order to recommend to users. @illusional, @vsoch @geoffjentry

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:36 (12 by maintainers)

Top GitHub Comments

3reactions

rhpvordermancommented, May 12, 2020

I did not know of this thread. At our institute we have solved this differently. We use singularity exec and no specific pull command. This will try to locate the image in the cache whis is located in SINGULARITY_CACHEDIR (env variable). If it is already there it will use it. If not, it will download it. This will lead to race condition if it is used in a scatter.

We use https://github.com/biowdl/prepull-singularity to pull the images beforehand, so no race conditions occur.

I am also thinking of adding a docker_pull thing to the config, so you can do singularity exec {image} echo done! or something similar to make sure the cache is populated at workflow initialization time. I have no ETA on this though, for now the prepull singularity script works.

1reaction

vsochcommented, Oct 1, 2019

I hope you are doing the pull on a login / dev node and not on something running massively in parallel? Or that the shub:// uri is interchangeable with docker:// or library:// ? Doing exec/run/pull in parallel is what led to devastating events in July that warranted adding extreme limits for all users to the server, and almost was the end of Singularity Hub. Ideally this really needs to be done with just one pull, and done before anything is run in parallel.

Top Results From Across the Web

Build a Container — Singularity User Guide 3.7 documentation

Downloading an existing container from the Container Library . You can use the build command to download a container from the Container...

Singularity: Scientific containers for mobility of compute - PLOS

Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity ...

Containers - Cromwell - Read the Docs

Singularity is a container technology designed for use on HPC systems in ... We can prevent this by implementing a filelock with flock...

Using Singularity | USC Advanced Research Computing

Singularity is an open-source application for creating and running software ... multiple instances of the same container simultaneously for different jobs.

Containers on the HPC Clusters

Singularity is compatible with all Docker images and it can be used with GPUs ... These issues can be solved with Singularity by...