question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Docker install not finding GPUs

See original GitHub issue

On a p3.2xlarge, the GPU Docker install (built with the --gpu flag) is not finding GPU devices. From ts_log.log - note the “Number of GPUs” line:

model-server@a8819511cfaa:~/logs$ more ts_log.log
2020-04-16 01:53:17,099 [INFO ] main org.pytorch.serve.ModelServer -
TS Home: /usr/local/lib/python3.6/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 8
Max heap size: 13644 M
Python executable: /usr/bin/python3
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 8
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

When starting on a GPU machine with CUDA drivers installed, I normally see Number of GPUs: 4.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
fbbradheintzcommented, Apr 17, 2020

That didn’t work either. On a p3.8xlarge (which has 4 CUDA-compatible GPUs), I built the Docker image with the --gpu directive, as the instructions prescribed. I then issued the same command line @harshbafna listed above, and got this result:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\\\n\\\"\"": unknown.

This is on DL AMI 27, Ubuntu 18, current master branch.

0reactions
chauhangcommented, May 10, 2020

@maaquib Please see the comments left in the commit and address them.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using GPU from a docker container? - cuda - Stack Overflow
Run Docker container with nvidia driver pre-installed ... Essentially they have found a way to avoid the need to install the CUDA/GPU driver ......
Read more >
Using Your GPU in a Docker Container - Roboflow Blog
The NVIDIA Container Toolkit is the solution to configure your GPU within a Docker container. Follow this step-by-step guide to get started.
Read more >
Enabling GPU access with Compose - Docker Documentation
Enabling GPU access with Compose. Compose services can define GPU device reservations if the Docker host contains such devices and the Docker Daemon...
Read more >
How to Use an NVIDIA GPU with Docker Containers
At a high level, getting your GPU to work is a two-step procedure: install the drivers within your image, then instruct Docker to...
Read more >
Installing Docker and The Docker Utility Engine for NVIDIA ...
The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includes a container runtime ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found