Right way to specify GPU memory in DDPPO.
See original GitHub issueHi,
What parameters do we need to change in order to utilize all GPU resources?
I’m reproducing the DDPPO results using the single_node.sh
script. I have a GPU with 12GB of VRAM as below.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:05:00.0 Off | N/A |
| 22% 59C P8 30W / 250W | 4501MiB / 12210MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
The VRAM utilized is only 4GB. There are 4 workers running,
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17009 G ...a/miniconda3/envs/habitatapi/bin/python 1382MiB |
| 0 17010 G ...a/miniconda3/envs/habitatapi/bin/python 1375MiB |
| 0 17011 G ...a/miniconda3/envs/habitatapi/bin/python 1377MiB |
| 0 17012 G ...a/miniconda3/envs/habitatapi/bin/python 355MiB |
+-----------------------------------------------------------------------------+
How can I utilize the remaining 8GBs to speed up the training? In the ddppo_pointnav.yaml, NUM_PROCESSES = 4. Should I increase this param or there are other configurations I need to change as well?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Right way to specify GPU memory in DDPPO. #296 - GitHub
I'm reproducing the DDPPO results using the single_node.sh script. I have a GPU with 12GB ... Right way to specify GPU memory in...
Read more >Efficient Training on a Single GPU - Hugging Face
So now we can start training the model and see how the GPU memory consumption changes. First, we set up a few standard...
Read more >GPU memory allocation - JAX documentation - Read the Docs
Either use XLA_PYTHON_CLIENT_MEM_FRACTION to give each process an appropriate amount of memory, or set XLA_PYTHON_CLIENT_PREALLOCATE=false . Running JAX and GPU ...
Read more >Allocating Memory - Princeton Research Computing
This page explains how to request memory in Slurm scripts and how to deal with common errors involving CPU and GPU memory. Note...
Read more >PyTorch 101, Part 4: Memory Management and Using Multiple ...
This article covers PyTorch's advanced GPU management features, how to optimise memory usage and best practises for debugging memory errors.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This all seems correct to me.
The frame rates reported by the training script include the simulation time, the model inference time, and the parameter update time, so it will be considerably lower than just the simulator itself.
Yes, you can increase the number of processes. With that said, I am surprised that there is no GPU memory being used by CUDA (would be type C), is the model on the CPU? In which case, I would highly recommend using the remaining GPU memory for the model and forward/backward passes.