question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Right way to specify GPU memory in DDPPO.

See original GitHub issue

Hi,

What parameters do we need to change in order to utilize all GPU resources?

I’m reproducing the DDPPO results using the single_node.sh script. I have a GPU with 12GB of VRAM as below.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:05:00.0 Off |                  N/A |
| 22%   59C    P8    30W / 250W |   4501MiB / 12210MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+

The VRAM utilized is only 4GB. There are 4 workers running,

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     17009      G   ...a/miniconda3/envs/habitatapi/bin/python  1382MiB |
|    0     17010      G   ...a/miniconda3/envs/habitatapi/bin/python  1375MiB |
|    0     17011      G   ...a/miniconda3/envs/habitatapi/bin/python  1377MiB |
|    0     17012      G   ...a/miniconda3/envs/habitatapi/bin/python   355MiB |
+-----------------------------------------------------------------------------+

How can I utilize the remaining 8GBs to speed up the training? In the ddppo_pointnav.yaml, NUM_PROCESSES = 4. Should I increase this param or there are other configurations I need to change as well?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
erikwijmanscommented, Mar 19, 2020

This all seems correct to me.

For both models I’m getting a framerate of around 50fps and it seems to me a bit low.

The frame rates reported by the training script include the simulation time, the model inference time, and the parameter update time, so it will be considerably lower than just the simulator itself.

1reaction
erikwijmanscommented, Feb 16, 2020

Yes, you can increase the number of processes. With that said, I am surprised that there is no GPU memory being used by CUDA (would be type C), is the model on the CPU? In which case, I would highly recommend using the remaining GPU memory for the model and forward/backward passes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Right way to specify GPU memory in DDPPO. #296 - GitHub
I'm reproducing the DDPPO results using the single_node.sh script. I have a GPU with 12GB ... Right way to specify GPU memory in...
Read more >
Efficient Training on a Single GPU - Hugging Face
So now we can start training the model and see how the GPU memory consumption changes. First, we set up a few standard...
Read more >
GPU memory allocation - JAX documentation - Read the Docs
Either use XLA_PYTHON_CLIENT_MEM_FRACTION to give each process an appropriate amount of memory, or set XLA_PYTHON_CLIENT_PREALLOCATE=false . Running JAX and GPU ...
Read more >
Allocating Memory - Princeton Research Computing
This page explains how to request memory in Slurm scripts and how to deal with common errors involving CPU and GPU memory. Note...
Read more >
PyTorch 101, Part 4: Memory Management and Using Multiple ...
This article covers PyTorch's advanced GPU management features, how to optimise memory usage and best practises for debugging memory errors.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found