question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to run the voltaml/volta_diffusion:v0.1 docker image

See original GitHub issue
-> % sudo docker run -it --gpus all voltaml/volta_diffusion:v0.1 bash
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/e049fdb3bc56fecdeefb3b950034cbc757eeb166b152330d00ef6e8a2972af06/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

This is probably because when --gpus=all is specified, the Docker engine will try and mount all the nvidia & cuda bits & pieces into the container. But some of the files in the image (e.g. /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1) are actually links rather than files, so the mounting process is not successful.

Please can you open source the Dockerfile as well.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
Pop115commented, Nov 25, 2022

Same issue here, found an issue related to this on nvidia-docker repo https://github.com/NVIDIA/nvidia-docker/issues/1551

I made a Dockerfile containing this

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

and executed it with docker build -t voltaml/volta_diffusion -f Dockerfile .

And it seems to work

1reaction
JackCloudmancommented, Nov 25, 2022

Download this file https://gist.github.com/JackCloudman/7143c7aeaafa54ed35b3f6cfe8a30c57

docker build -t voltaml/volta_diffusion:v0.1 -f Dockerfile .
docker run -it --gpus=all -p "8888:8888" voltaml/volta_diffusion:v0.1 jupyter lab --port=8888 --no-browser --ip 0.0.0.0 --allow-root
Read more comments on GitHub >

github_iconTop Results From Across the Web

Run your image as a container - Docker Documentation
To run an image inside of a container, we use the docker run command. It requires one parameter and that is the image...
Read more >
docker run - Docker Documentation
The docker run command first creates a writeable container layer over the specified image, and then starts it using the specified command.
Read more >
Image Access Management - Docker Documentation
This feature allows Organization owners to control which types of images (Docker Official Images, Docker Verified Publisher Images, Community images) their ...
Read more >
Run your image as a container - Docker Documentation
To run an image inside of a container, we use the docker run command. The docker run command requires one parameter and that...
Read more >
How to Fix and Debug Docker Containers Like a Superhero
Container errors are tricky to diagnose, but some investigative magic works wonders. Read along to learn how to debug Docker containers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found